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Fat on the Verge of a Nervous Breakdown 

ZENG ET AL., PAGE 84 

Leptin action in the brain stimulates sympathetic neurons that innervate fat de- 
posits, forming neuro-adipose junctions that break down fat. Local optogenetic 
activation of these neuronal terminals promotes fat loss and could be used to 
circumvent central leptin resistance. 



Building a Bigger Brain 

POLLEN ET AL., PAGE 55 

Radial glia in the ventricular and outer subventricular zones show molecular dif- 
ferences at the single-cell level, and transcriptomic analysis suggests that outer 
radial glia generate a self-sustaining proliferative niche that supports primate 
brain expansion during development of the cerebral cortex. 



Looking Human 

PRESCOTT ET AL., PAGE 68 

Epigenome and transcriptome profiling from in-vitro-derived human and chimpanzee cranial neural crest cells allows for 
exploration of recent changes in the c/s-regulatory landscape underlying human craniofacial evolution. 



Microbiota on a Gastro-Tour 

DEY ET AL., PAGE 95 

A mouse model of short-term dietary changes, mimicking what happens when humans travel to places with different culinary 
traditions, reveals how a single food ingredient like turmeric can, in combination with microbially generated biomolecules, 
regulate host physiology. 



Liquified Mitotic Forces 

JIANG ET AL., PAGE 108 

A protein associated with mitotic spindle transitions coalesces into a liquid droplet to promote microtubule polymerization and 
spindle assembly, suggesting that the biophysical properties associated with liquid demixing may shape the characteristics of 
a hypothesized but elusive spindle matrix. 



When Stress Relief Gets Sticky 

MOLLIEX ET AL., PAGE 123 

RNA binding proteins with low complexity sequence domains drive liquid 
phase separation to form stress granules within the cytoplasm; however, if 
the granules persist, pathological protein fibrillization results. 



Nuclear Organization Cell by Cell 

KIND ET AL., PAGE 134 

Looking at chromosome-lamina interactions in single cells reveals cell-to-cell 
variation in interphase chromosome architecture and extensive intra-chromo- 
somal coordination of nuclear lamina contacts. 
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Central Circadian Control for Plants 

TAKAHASHI ET AL., PAGE 148 

Plants harbor a clock in their shoot apex that functions like the mammalian 
suprachiasmatic nucleus, acting to couple and synchronize rhythms in distal 
organs. 



Striking at the Heart of Triple-Negative Breast 
Cancer 

WANG ET AL., PAGE 174 

Triple-negative breast cancer’s strong dependence on the transcriptional ki- 
nase CDK7, which drives expression of a cluster of cancer-promoting genes, 
suggests a potential new therapy. 



Decoder Ring for Mutations in Cancer Signaling 

CREIXELL ET AL., PAGE 187 
CREIXELL ET AL., PAGE 202 

Determining the residues that drive the specificity of kinases and of SH2 domains that bind phosphorylation sites paves the 
way for a systematic interpretation of mutations on signaling networks. Applying this approach to genomic variants in cancer 
reveals the many ways in which signaling networks can be rewired, including the creation or destruction of phosphorylation 
sites. 



Monitoring Methylation Changes in Single Cells 

STELZER ET AL., PAGE 218 

A clever reporter system indicates DNA methylation status in single cells and how it changes over time in vivo. 



The Silence of the Proviruses 

YANG ET AL., PAGE 230 

Proviral silencing is a characteristic of the pluripotent state, and the precise expression of endogenous retrovirus is critical for 
embryogenesis and development. Identification of cellular factors and mechanisms involved in retroviral repression in embry- 
onic stem cells provides key insights into these processes. 



Chromatin Architecture in the Brain 

LINHOFF ET AL., PAGE 246 

Interrogation of chromatin architecture at high resolution in complex tissues 
such as the brain is made possible through the combined analysis of epigenetic 
modifications, intranuclear localization of specific DNA sequences, and high- 
resolution segregation of nuclear compartments using advanced array tomog- 
raphy imaging. 



The FAKts about Tregs in Cancer 

SERRELS ET AL., PAGE 160 

Nuclear Focal Adhesion Kinase or FAK regulates transcription of chemokines 
that drive recruitment of tumor-associated regulatory T cells to promote onco- 
genic growth by inhibiting cytotoxic CD8+ T cells. A FAK inhibitor helps to 
deplete Tregs and initiate anti-tumorigenic responses. 
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Taking a Healthy Interest in Translational Research 



It is now readily possible to determine, often in exquisite detail, 
the ways in which a single cell or a single individual differs from 
another. Although we are rapidly figuring out how these cellular 
features are altered in disease states, pulling medical advances 
from this haystack of facts remains a challenge. Yet, the view 
from Cell on prospects for medical advancement in this era of 
integrated “omics” could be summed up in a single word: opti- 
mistic. From our daily engagement with researchers in pursuit 
of the most exciting scientific discoveries, we feel a palpable 
acceleration of interest and enthusiasm in translational work 
occurring within the community. 

Cell has a key role in fostering this essential dialog between 
the research and medical fields, and we’re eager to continue 
pushing the boundaries of how we think about disease and its 
treatment. Papers that resonate with both communities come 
in many varieties, and you can find examples in our recent issues 
that include enhancing transplantation efficiency for stem cells 
derived from cord blood (http://www.cell.com/cell/abstract/ 
S0092-8674(1 5)00574-7), using organoids for personalized drug 
screening (http://www.cell.com/cell/abstract/S0092-8674(1 5) 
00373-6), suggesting new combination treatments in cancer 
immunotherapy (http://www.cell.com/cell/abstract/S0092-8674 
(15)01040-5 and http://www.cell.com/cell/abstract/S0092-8674 
(15)01028-4), and exploring antibody-based treatments for 
cachexia (http://www.cell. com/cell/abstract/S0092-8674(1 5) 
01045-4). Also as part of our commitment to the dialog across 
the medical-research divide, Cell Press has recently joined 
forces with The Lancet group of journals to support the launch 
of an exciting new open access journal, EBioMedicine, with 
the aim of creating a forum to bring a community of scientists 
and physicians together toward the shared goal of improving 
human health. 

As we expand into more translational and clinical arenas, we 
need to educate ourselves about publication standards and con- 
cerns in more clinical fields while building our pool of qualified 
rigorous reviewers. For example, we find that many disease-ori- 
ented studies increasingly include testing of compounds or com- 
bination therapies in mouse models, and a major bottleneck for 
these efforts is to improve reproducibility and the success rate 
of new prospective treatments through the pre-clinical and clin- 
ical development pipeline. Glenn Merlino and colleagues on 
page 39 of this issue tackle this matter of improving the ability 
of pre-clinical models in cancer to predict successful 
outcomes in human trials and highlight new approaches. We 
are also observing an increasing number of studies crossing 
our transom that take advantage of patient samples and other 
human data to support a proposed target’s therapeutic poten- 
tial. For these, we need to pay special attention to issues of con- 
sent and patient confidentiality and to the importance of study 
design and the use of appropriate statistics in assessing clinical 
data. Overall, the engagement of our community of researchers 
with the therapeutic development pipeline is robust and 
strengthening. With the accelerating pace at which new discov- 
eries are informing new potential treatments, it is now possible to 



envision a time in the not too distant future when a paper in Cell 
could include both basic insights and results from randomized 
human clinical trials. 

In the world of translational research, a landmark event is 
governmental approval of a new drug or therapeutic, which 
directly impacts how a disease is treated and offers new hope 
for patients. To celebrate these achievements and help promote 
the conversation between basic scientists, pharma/biotech, and 
physicians, we’ve recently launched a new format called “Bench 
to Bedside.” On page 17, you will see our latest feature, which 
looks at Orkambi, a combination therapy that promotes the cor- 
rect folding of the mutated channel that causes cystic fibrosis. 
Following its approval, it is expected that up to 50% of patients 
will benefit, marking a major milestone for a difficult-to-treat dis- 
ease and highlighting protein folding as a medically targetable 
aspect of cell biology. All of this stems from the discovery 
more than 20 years ago that the disease is caused by mutations 
in a channel protein that impair its trafficking to the cell surface 
and promote its degradation (http://www.cell.com/cell/ 
abstract/0092-8674(90)901 48-8). 

For the bench researcher, Bench to Bedside will keep you 
abreast of exciting developments in new drugs and biologies. 
For the clinician, we aim to provide a succinct reference that con- 
veys how the treatment works and the biological discoveries that 
made it possible. The timelines of key discoveries that feature in 
each Bench to Bedside hammer home the societal impact of 
sustained investment in research, and we hope that these will 
spark pride in what the research, pharma, and medical commu- 
nities have together accomplished. 

As a community, it is essential that we make clear to the public 
and to policymakers the ways in which basic and translational 
discoveries contribute to new treatments. On page 21 of this 
issue, Sanders Williams and colleagues present a Commentary 
that advocates for greater use of data mining and network anal- 
ysis in assessing the individual and institutional contributions to 
recently approved treatments. They create a new bioinformatics 
tool, and using the example of ivacaftor (a component of 
Orkambi), they establish that the timeline of key discoveries in 
the cystic fibrosis field goes back nearly six decades and includes 
efforts from more than 2,000 researchers. The analyses also pro- 
vide insight into the most influential researchers and institutions 
for a given breakthrough, offering quantifiable validation for why 
long-term thinking matters when it comes to research funding. 

When Cell was founded four decades ago, the molecular 
biology revolution was exploding, powered by recombinant 
DNA technologies. This is when Cell established the roots that 
still nourish its pages today. Staying at the cutting edge ever 
since has led us in new areas, and today one of these areas 
that is rapidly expanding, powered by new omics technologies 
and mechanism-based drug development, is translational 
research. Being explicit in our advocacy and interest in transla- 
tional work comes at the risk of some fearing that Cell “won’t 
like my paper because it doesn’t have a translational message.” 
We’ll stop you right there. For one, we’ve had a longstanding 
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interest in translational potential as one of the many elements 
considered when evaluating a study, and so this editorial 
perspective is not new. More to the point, conceptual impact— 
from whatever source or discipline— is our guiding principle, 
and we will continue to aggressively pursue and publish what 
is most exciting in basic biological research. Even as we remain 



firmly grounded by our roots, we see in the current era a lifting of 
the fog between basic biological insight and clinical treatment 
that is making the shorelines of each visible to the other. This 
process will shape how these disciplines cross-pollinate, and 
as journal, we are dedicated to amplifying and enhancing the 
speed of this exchange. 

The Cell editorial team 
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For many of us, salicylic acid may conjure up teenage mem- 
ories of waking up to a huge pimple taking form on our face, 
just in time for picture day or the school dance. Perhaps 
fewer of us are aware that like many medicinal compounds, 
this acne-fighting molecule comes from plants. In fact, sali- 
cylic acid is a well-studied plant hormone that plays key 
roles in a variety of fundamental physiological processes, 
including growth, stress responses, and perhaps most 
notably, the plant’s innate immune response to microbes. A 
new study now shows that salicylic acid also plays a key 
role in shaping the root microbiome (Lebeis et al., 2015). 

Like us, plants have a complex relationship with the bacteria 
in their environment (Sloan and Lebeis, 2015). Exactly how 
host-microbiome communication is mediated is still unclear, 
but in animals, the host immune system closely regulates the 
community of microbes resident in the gut (Belkaid and 
Hand, 201 4). Just as animals are exposed to a variety of bacte- 
ria through their guts, where nutrient absorption takes place, 
plants encounter a huge assortment of soil bacteria through 
their root systems. However, as is the case for the gut micro- 
biome, only a limited selection of bacteria actually takes up resi- 
dence in the roots (Lund berg et al., 2012; Bulgarelli et al., 201 2). 

A recent study from Jeff Dangl and colleagues now shows 
that plants cultivate the bacteria in their root systems, tending 
some and weeding out others (Lebeis et al., 2015). By using 
Arabidopsis mutants in which salicylic acid signaling was either 
disrupted or constitutive, the authors found that the phytohor- 
mone altered the bacterial communities colonizing the root. 
Remarkably, this shift reflected a phylum-level regulation 
rather than just changes in specific species of bacteria, sug- 
gesting that the overall structure of the root microbiotal com- 
munity is controlled by salicylic acid. In further bacterial coloni- 
zation experiments analogous to those carried out in germ-free 
animals, Lebeis et al. reconstructed a synthetic microbial com- 
munity in sterile seedlings grown in artificial soil and showed 
that salicylic acid directly inhibits growth of some bacteria 
but promotes the growth of others. In fact, one type of bacteria 
carrying salicylate metabolism genes was able to grow on min- 
imal media with salicylic acid as the only carbon source, sug- 
gesting that the plant hormone directly feeds its growth. 

In addition to providing a potential entry point for improving 
crop production through microbiome modulation, the study 
raises broader questions about host-microbiome relation- 
ships. The finding that a single molecule can shape the taxo- 
nomic structure of the microbiome is intriguing; given that 
salicylic acid is so crucial for regulating systemic functions 
in plants, it will be interesting to see whether there will be 
analogous systemic mechanisms in animals. 

Moreover, given that we ingest plants and that diet affects 
the gut microbiome (Faith et al., 2011), how might this rela- 
tionship between plant compounds and bacteria play out in- 
side us? Soil bacteria are remarkably effective at colonizing 
the gut in mice, outcompeting even gut bacteria from other 
organisms (Seedorf et al., 2014). Do soil-derived bacteria 
that have evolved to be responsive to plant innate immune 
hormones reside in our own gut? If so, what happens when 
we eat plants, thereby introducing the compounds that regu- 




Arabidopsis plant growing in microbe-rich soil, (image from iStock.- 
com/dra_schwartz). 



late these bacteria into our bodies? What effects might this 
interaction in turn have on our own innate immune function 
and its regulation of the gut microbiome? 

It should also be noted that aspirin, which has anti-inflam- 
matory, anti-diabetic, and anti-cancer properties, converts to 
salicylic acid in the stomach. Although salicylic acid has been 
shown to activate AMPK, thereby increasing fat oxidation, its 
anti-diabetic effects are observed even in AMPK mutant mice 
(Hawley et al., 2012), suggesting an alternate mechanism. 
Given the effects of salicylic acid on the root microbiome, 
might some of its health consequences in animals be in 
part mediated through microbes in the gut? Ultimately, un- 
derstanding the molecular mechanisms underlying host- 
microbiota communication will provide the framework for ad- 
dressing important questions in human health and disease. 
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This year marks the 1 50 th anniversary of the presentation by Gregor Mendel of his studies of plant 
hybridization to the Brunn Natural History Society. Their nature and meaning have been discussed 
many times. However, on this occasion, we reflect on the scientific enterprise and the perception of 
new discoveries. 



Moche, Ming, and Mendel 

Mendel sought and applied principles of 
probability to genetic ratios to develop a 
“law” describing the behavior of factors 
for plant characteristics over generations. 
It was known from horticultural and plant 
and animal domestication work at the 
time that hybrids would “revert” to grand- 
parental forms in their progeny. Mendel 
(Mendel, 1866) alluded to such findings 
in the Introduction to his seminal paper. 
Charles Darwin (Darwin, 1868) also noted 
“crossed forms of the first generation are 
generally nearly intermediate in character 
between the two parents, but in the next 
generation the offspring commonly revert 
to one or both of their grandparents and 
occasionally to more remote 
ancestors” in his 1868 book. 

Indeed, it is potentially the 
case that “genetic ratios” 
were observed by many over 
the millennia. Walt Galinat 
(Galinat, 1998) noted that the 
Moche culture, who popu- 
lated the coast of present 
day Peru centuries ago and 
who were renowned of 
their ceramic creativity, have 
among their many designs, 
images of four maize plants 
with different characteristics 
in 3:1 or 1:1 ratios. Did this 
indicate a familiarity with 
the basics of genetics? The 
skeptic would note that any 
difference among four plants 
will produce 3:1 and 1:1 ra- 
tios. But why four plants? 

Whoever in China assem- 
bled the silkie chicken (wu 
gu ji) variety (Dorshorst et al., 

2010) with its set of bizarre 
single gene characteristics 



such as fluffy feathers, black skin and 
bones, blue earlobes, rose comb, poly- 
dactyly, feathered legs, and short tail 
feathers, first written about from the 
travels of Marco Polo around the time of 
the Ming Dynasty, but likely generated 
well before then, must have understood 
genetic ratios and how they operate in 
making combinations. No doubt there 
could be other examples. However, Men- 
del was the first to publish an attempt to 
attribute a significance or “law” to such 
ratios! 

Some have questioned whether Men- 
del knew the actual significance of his 
work (Endersby, 2007) and that Correns, 
de Vries, and Tschermak, who discovered 



similar findings in 1900, should rightfully 
be declared the fathers of genetics. Yet, 
Mendel did recognize a pattern where 
others did not; he recognized that there 
were factors following these patterns 
that determined characteristics of organ- 
isms; he tried to rationalize how inheri- 
tance in general could be explained by 
many such factors. While it might or might 
not be the case that the numerical classes 
of inherited characteristics were recog- 
nized before Mendel, to his credit he 
explicitly stated that the factors he 
studied were involved with inheritance. 
Indeed, this fact is apparently not neces- 
sarily intuitive as illustrated by those that 
went before him and did not recognize 
the significance of these pat- 
terns. Sophomore genetics 
students also illustrate this 
point. After many attempts 
by the author to explain the 
meaning of Mendel’s results, 
an exasperated student ex- 
claimed during office hours: 
“Why does two piles of 
peas mean that a gene is 
involved?” 




Statue of Gregor Mendel at the St. Thomas Abbey in Brno, Czech 
Republic, where Mendel worked. Photo by James A. Birchler. 



Menaces 

One reason that Mendel’s 
work might have not met 
with wide acceptance is that 
it actually did not seem to 
explain much about how in- 
heritance is realized in prac- 
tice. We now know that his 
“factors” behave as they do 
because they reflect the me- 
chanics of meiosis. Because 
of this realization, we auto- 
matically think in Mendelian 
terms with regard to the 
action of alleles and genes 
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regardless of whether or how they affect 
the phenotype. But meiosis was not 
known at the time. And most traits that 
we typically examine are controlled by 
quantitative trait loci that are multigenic, 
semidominant to some degree, of small 
effect and variable in the extent with 
which they affect the phenotype. Mendel 
actually noted that some characteristics 
that he considered showed “the differ- 
ence is of a ‘more or less’ nature...” and 
therefore he did not use them. He refers 
to the characters that he did use as “con- 
stant characters.” He notes that previous 
workers had described hybrids as often 
intermediate between the parents but he 
ascribes this to the random distribution 
of multiple characters that were indepen- 
dent of each other. The variation for 
quantitative characters is of low magni- 
tude and multigenic; therefore, it is diffi- 
cult to observe segregation ratios. Mendel 
rationalized this by suggesting that an 
astronomical number of progeny would 
be needed to see the reconstitution of a 
parental type. Indeed, in a broad sense, 
this is true. 

Another feature of hybrids that poten- 
tially confounded the acceptance and 
appreciation of Mendel’s studies is the 
phenomenon of hybrid vigor or heterosis. 
Mendel actually noted the more robust 
nature of hybrids in his description of the 
dwarf versus normal sized pea plants. 
Darwin (Darwin, 1876) also examined 
this reaction of hybrids extensively and 
Mendel made no attempt to explain it, 
rightfully so. 

The most famous example of results that 
stood in the way of acceptance was Men- 
del’s use of hawkweed (Hieracium spp.) in 
subsequent experiments to confirm his re- 
sults with peas, beans, and other species 
(Mendel, 1870). Hawkweed exhibits very 
extensive variation in form and features 
that would have seemed to be an excellent 
system to investigate their behavior in hy- 
brids and their progeny. However, the 
reason for this great variability is that 
hawkweed is clonally reproduced via 
the process of apomixis that bypasses 
meiosis but still produces seeds. New mu- 
tations or chromosomal abnormalities that 
arise in a clone are maintained. However, 
because they produce pollen and seeds, 
one would be tricked into believing hybrids 
could be made when in fact this would not 
be the case. Mendel thought he had suc- 



ceeded in producing hybrids but they 
usually followed the “maternal” type indi- 
cating the presumed stable nature of the 
characters, which did not revert (Ender- 
sby, 2007). Ironically, this line of investi- 
gation was encouraged by the Swiss 
botanist, Carl Nageli (Mendel, 1870), a 
proponent of a view at the time of blending 
inheritance, the concept that determinants 
come together in hybrids and mix irrevo- 
cably, but because hawkweed reproduces 
asexually, hybrids resemble the maternal 
parent and do not usually produce inter- 
mediate phenotypes that were the hall- 
mark in the perpetuation of the blending 
concept. 

Mendel could not have made sense of 
his observations without setting aside 
quantitative traits and heterosis although 
his article states that he did just that for 
many aspects of hybrid plants, apparently 
realizing this need. These three “men- 
aces” to Mendel, quantitative inheritance, 
heterosis, and apomixis represent three 
little understood aspects of genetics to 
this day and are worthy of investigation. 
The number of genes and their intricate 
interactions affecting quantitative traits 
(Mackay, 201 4) and the potential non-line- 
arities that they exhibit (Birchler and Veitia, 
201 2) are yet to be fully elucidated. Heter- 
osis, despite being the foundation of world 
food production, has managed to conceal 
its secrets (Birchler, 2015). Likewise, 
apomixis (Ronceret and Vielle-Calzada, 
201 5), which has been proposed to fix het- 
erosis for clonal propagation over multiple 
generations, is equally mysterious. 

Mechanism 

In the highly speculative scenario that 
Mendel had submitted his manuscript to 
a present day high impact journal, it would 
no doubt be dismissed as “descriptive,” 
“premature,” and “lacking in mechanistic 
insight” with the result that what is 
considered to be a seminal contribution 
to science would be relegated to a spe- 
cialty journal. However, all of science is 
descriptive; it just varies in the level and 
magnitude of detail. Every novel discov- 
ery is premature in understanding and 
lacking in details of mechanism. 

It would not have been possible to 
define much in the way of mechanism at 
the time. Mendel’s work preceded the 
discovery of meiosis in 1876 by Hertwig 
and its explanation by Weismann in 



1890. Even after the rediscovery of Men- 
del’s work, it took the suggestion of 
Sutton, Boveri, and Wilson that Mendel’s 
factors reside on chromosomes and the 
formulation of the chromosome theory of 
inheritance to gain an appreciation of 
why Mendel’s factors behaved as they 
do. Perhaps a lesson to be learned is 
that valid observations existing in a mech- 
anistic vacuum are to be valued and used 
as an inspiration for experiments to un- 
derstand them better. 

Models 

Although the details are missing and will 
never be known for certain, Mendel’s 
Introduction to his paper (Mendel, 1866) 
suggests that he was aware of the phe- 
nomenon of “reversion” and had a 
“model” to explain this phenomenon. 
The model was useful in finding a good 
organism for the experiments (i.e., peas 
because they have concealed self-polli- 
nation but can be crossed when desired) 
and selecting the plant characters to 
examine. Yet Mendel went on to attempt 
to explain, using his model, the commonly 
known observation that hybrids could 
often be intermediate (now referred to as 
semidominant, additive, or dosage sensi- 
tive) in phenotype between the parents. 
While there is a partial insight in his expla- 
nation of multifactorial basis, his attemp- 
ted explanation likely was unpersuasive 
in the context of his time. 

Darwin conducted extensive studies of 
intentional self-pollination of a wide variety 
of plant species (Darwin, 1877) that natu- 
rally outbreed and documented the 
changes seen. He clearly found everything 
that Mendel did but not in as systematic 
way. For flower morphs, called pins and 
thrums, that foster outcrossing by their 
alternately placed stigmas and anthers, 
Darwin found what we now call dominant 
and recessive forms, which when hybrids 
were made and the progeny self-polli- 
nated, the next generation “reverted” to 
both grandparental types but favoring 
one in number (i.e., the dominant form). 
In Primula vulgaris, he studied these flower 
morphs, which also differed in the parents 
for purple and yellow flower color, which, 
upon selfing the hybrids, the progeny 
showed a “3:1” ratio of purple to yellow. 
Further self pollination showed that the 
yellow form bred true and the purple again 
“reverted” with a preference for the purple 
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form. The flower color characteristic was 
independent of the flower morphs. In his 
study of tristyly with three morphs, he 
also clearly states that “it is the rule that 
plants thus derived usually consist of 
both parental forms, but not of the third 
form,” illustrating he recognized that only 
two types could be present in any one in- 
dividual. The flower morph forms could 
breed true while the flower color reverted, 
which we now call independent assort- 
ment. Thus, one can recognize in Darwin’s 
data, dominant and recessive characters, 
the fact that only two forms are present in a 
hybrid, their reappearance in the next 
generation and the independence of 
different characters. But Darwin did not 
subscribe any “law” to these observa- 
tions; he entered the study concerned 
that these plants did not naturally inbreed 
and were often highly sterile when they 
did— facts that he sought to understand 
within the context of his concept of natural 
selection. His “model,” if one will allow the 
analogy, made him focus on his issue of 
concern and therefore he did not recog- 
nize the same principles of inheritance 
that Mendel did. 

Models are good for designing experi- 
ment to test the limits and validity of a 
hypothesis. However, the originators of 
models often overextend their explana- 
tory power. On the other hand, they 
also restrict one’s thinking to a particular 
intellectual framework leaving potentially 
informative experiments unimagined. The 
examples of Mendel and Darwin illustrate 
both of these points. This in no way dimin- 
ishes their respective contributions to 
science. 

Marketing 

Scientific acceptance depends on when, 
where, and by whom new knowledge is 
proposed. With regard to Mendel, Cock, 
and Forsdyke (Cock and Forsdyke, 
2008) noted: “Then, as now, in marketing, 
simple messages worked. Then, as now, 
the same applied to the marketing of sci- 
entific ideas. Accordingly, subtle scientific 
ideas tended to lose out to simple scienti- 
fic ideas and subtle scientists tended to 
lose out to the unsubtle.” 

While there was an excitement that 
followed the “re-discovery” of Mendel’s 



laws in 1900, there were many skeptics. 
The British biologist, William Bateson, 
who had been studying discontinuous 
variation in Brassica and who coined 
the term “genetics,” became a traveling 
salesman for Mendelian principles 
speaking in favor far and wide with great 
zeal (Cock and Forsdyke, 2008). Ironi- 
cally, Bateson himself was skeptical of 
the chromosome theory of inheritance, 
preferring instead to think in terms of 
many independent factors determining 
organismal characters (Cock and For- 
sdyke, 2008). Eventually, the Drosophila 
work of the T.H. Morgan lab showing 
association of genetic factors with chro- 
mosomes in various ways convinced 
Bateson (Cock and Forsdyke, 2008). 

It is often stated that seminal scientific 
discoveries would not go unnoticed for 
long because, if they are important, others 
will soon find them. But is that true? In 
the last paragraph of his paper, Mendel 
described white flowers with red stripes, 
which was likely due to a transposable 
element insertion into a flower pigment 
gene. Yet, it was not until the 1940’s that 
Barbara McClintock recognized the sub- 
tle patterns required to decipher mobile 
genetic elements (McClintock, 1950) that 
this phenomenon began to be under- 
stood. Yet again, it took decades further 
before the significance of McClintock’s 
discoveries was realized and their gener- 
alization was appreciated. 

A common principle often invoked in 
scientific discourse is Occum’s Razor. 
This principle dictates that the simplest 
or most parsimonious explanation should 
be favored. However, one should keep in 
mind that a simple explanation that does 
not explain the facts is to be discarded. 
Sydney Brenner introduced the concept 
of “Occum’s Broom,” which is used to 
sweep inconvenient truths under the rug 
to salvage the “simplest” explanation. 
Recognizing when to use the razor and 
avoid the broom is a useful reflection in 
evaluating scientific models as the subtle 
Mendel and McClintock examples attest. 

And More 

In his garden 

Mendel planted his peas 

And made many crosses 



As he would please 
Next generation 
Let them self pollinate 
And counted the types 
For factors particulate 
Some seeds were yellow 
And some were green 
Three to one ratios 
In the numbers were seen 
More factors were added 
One— by— one 
Independent assortment 
When all said and done 
Some doubted his findings 
So were lost from sight 
But upon rediscovery 
Mendel was right! 
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In 1915, “The Mechanism of Mendelian Heredity” was published by four prominent Drosophila 
geneticists. They discovered that genes form linkage groups on chromosomes inherited in a 
Mendelian fashion and laid the genetic foundation that promoted Drosophila as a model organism. 
Flies continue to offer great opportunities, including studies in the field of functional genomics. 



This year we celebrate the 1 00 th anniver- 
sary of the publication of the book “The 
Mechanism of Mendelian Heredity” by 
Thomas H. Morgan, Alfred H. Sturtevant, 
Hermann J. Muller, and Calvin B. Bridges 
(Morgan et al., 1915). The work published 
by these four giants in the Drosophila field 
was the most influential scientific work 
in the field of genetics since Gregor 
Mendel’s work in 1866. Although the 
achievements of Mendel were ignored in 
the 19 th century, the rediscovery of Men- 
del’s law in 1900 led to the foundation 
of the field of genetics. Morgan, who initi- 
ated his work on Drosophila in 1909, was 
an embryologist who became attracted to 
flies because of the discovery of genetic 
variants. Interestingly, in his early career 
(1900-1910), Morgan was critical of 
the Mendelian theory of heredity and 
skeptical of the fact that species arise 
by natural selection as postulated by 
Charles Darwin. Moreover, in his accep- 
tance speech for the Nobel Prize of 
1933, he downplayed the contribution of 
Drosophila research to human biology 
and medicine with one exception: genetic 
counseling. Morgan quickly changed 
his mind and became an advocate of 
Mendel’s and Darwin’s work, while re- 
searchers later showed that he was 
overly modest about the implications of 
Drosophila research on human biology. 

Drosophila Research in the 20 th 
Century 

Morgan initiated his work on Drosophila in 
1909 at Columbia University. He quickly 



attracted a set of superb scientists, and 
together, they elegantly documented 
many of the basic tenets of genetics, 
discovering that factors (now known as 
alleles of genes) form linkage groups, 
and that these linkage groups exhibited 
the same inheritance pattern as the chro- 
mosomes to which they mapped. Experi- 
mental data with mutants that map to sex 
chromosomes in Drosophila provided the 
central support for their hypothesis that 
genes are independent physical entities 
present in a linear array on chromosomes 
that follow Mendel’s law of independent 
segregation. They concluded their book 
by stating that: “Although Mendel’s 
law does not explain the phenomena of 
development, and does not pretend to 
explain them, it stands as a scientific 
explanation of heredity, because it fulfills 
all the requirements of any causal expla- 
nation” (Morgan et al., 1915). Despite 
the criticism toward Mendel’s work— 
that he had ignored or failed to report 
data that did not support his hypothe- 
sis— Morgan and colleagues gave Men- 
del the proper credit for discovering the 
principles of heredity, as is obvious from 
this statement as well as from the title of 
their book. 

Muller, Sturtevant, and Bridges as 
well as other fly geneticists continued to 
perform experiments that laid the basis 
of much of eukaryotic genetics between 
1 91 0 and 1 940. Muller developed the first 
balancer chromosomes which allowed 
him to discover that X-rays are mutagenic 
(Muller, 1927), for which he was awarded 



the Nobel Prize in 1946. Balancer chro- 
mosomes are still the most elegant 
means of preventing the exchange of 
genetic information between two homo- 
logous chromosomes, thereby giving 
researchers an efficient method to main- 
tain thousands of recessive lethal and 
sterile stocks without the need of mole- 
cular genotyping. Sturtevant demon- 
strated that the Bar eye phenotype is 
caused by unequal crossover, a phenom- 
enon which plays an important role in the 
generation of small chromosomal dupli- 
cations and deletions linked to human 
diseases (Lupski et al., 1996). Bridges 
constructed the first physical map of 
chromosomes for any organism by 
describing the banding pattern of the 
polytene chromosomes in the salivary 
gland of flies and provided a physical 
map of genes on the banded chromo- 
somes (Bridges, 1935). Bridges’ work 
demonstrated the correlation between 
the physical structure of chromosomes 
and genetically defined linkage groups. 

Drosophila research lost prominence 
in the 1940s as phages and bacteria 
dominated the field of genetics. However, 
a rebirth occurred in the early 1970s 
as two fields, neuroscience and devel- 
opmental biology, converged onto 
Drosophila research. This resurgence 
was in part because of the reagents 
created by the founders, the availability 
of many mutations affecting numerous 
traits, and the ability to efficiently create 
new mutations (Lewis and Bacher, 
1 968). Indeed, no higher eukaryotic model 
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organism in the seventies had the 
tools that allowed the manipulation of 
genes as elegantly and probingly as in 
Drosophila. 

The use of Drosophila as a model or- 
ganism for neuroscience and develop- 
mental biology led to discoveries that 
provided a lasting impact. Seymour 
Benzer and colleagues studied genes 
affecting visual behavior, olfaction, sexual 
behavior, learning and memory, diurnal 
rhythms, aging, and neurodegeneration 
(Jan and Jan, 2008). Their work led to 
the discovery of numerous important 
genes and proteins such as the first po- 
tassium and transient receptor potential 
(TRP) channels, key circadian clock 
genes, and genes required for learning 
and memory. Similarly, in 1978 Christiane 
Nusslein-Volhard and Eric Wieschaus 
decided to pursue a systematic genetic 
strategy to screen for mutants that 
affect the development of the embryo 
pattern and discovered many of the genes 
that are key players of developmental 
signaling pathways such as Notch, Wnt, 
Hedgehog, TGF-(3/BMP, and Toll/TLR 
(Nusslein-Volhard and Wieschaus, 1980). 
The impact of these discoveries have 
permeated almost every area of biology, 
including medical genetics and cancer 
biology (Wangler et al., 2015). 

The ability to manipulate the Drosophila 
genome was bolstered tremendously 
by the technology to introduce any type 
of DNA into the fly genome using 
P-element-mediated transposition (Rubin 
and Spradling, 1982). Since then 
numerous technologies have been devel- 
oped that allow extensive biological and 
genetic manipulation (Perrimon, 2014). 
The ability to manipulate the fly genome 
has enabled numerous scientists to con- 
tribute significantly to almost all areas 
of biology, including genetics, develop- 
mental biology, cell biology, neurosci- 
ence, physiology and metabolism, dis- 
ease mechanisms, population genetics, 
and evolution. 

Drosophila as a Model System for 
In Vivo Functional Genomics 

The breadth of tools that have been devel- 
oped and that are shared among the 
members of the fly community, in the 
tradition of the founders, permits sophis- 
ticated experiments that can be per- 
formed in very few model organisms. For 



example, these tools are being used to 
tease apart neuronal networks, assess 
and control specific behaviors, determine 
gene function in specific cells, and study 
physiological functions of proteins and 
metabolites. An area that has expanded 
significantly in the past 10 years is the 
study of fly genes whose human homo- 
logs cause genetic disorders. These 
studies attempt to better understand the 
basic biology of these genes and prod- 
ucts, and attempt to probe the mecha- 
nism by which specific mutations cause 
pathological phenomena such as neuro- 
degeneration (Jaiswal et al., 2012). 
Approximately 60% of the ~1 3,000 pro- 
tein coding fly genes are evolutionarily 
conserved in human, yet, a functional 
annotation of most of these genes is still 
lacking (Yamamoto et al., 2014). Better 
and more detailed annotations of func- 
tion and expression of thousands of 
Drosophila genes would help not only to 
better understand fly biology, but also to 
functionally annotate the human genome. 
Here, we will expand on some recently 
developed strategies that aim at providing 
functional data on fly genes and their 
expression patterns. These strategies 
also attempt to assess the function of 
human genes and provide data about 
the pathogenic impact of human muta- 
tions or variants. 

In his 201 5 State of the Union Address, 
President Obama announced the launch 
of the “Precision Medicine Initiative,” 
with the ultimate goal of improving medi- 
cal care by providing individuals with 
tailor-made prevention and treatment 
strategies. Due to the resources gener- 
ated through the human genome project 
and the recent advances in sequencing 
technology and bioinformatics, human 
geneticists can quickly identify the major- 
ity of the polymorphisms and variants in a 
personal genome. The real challenge 
in precision medicine, however, is the 
interpretation of such genomic data. Our 
ability to extract meaningful data from 
whole-exome sequencing data is damp- 
ened by the existence of numerous 
rare variants of uncertain/unknown sig- 
nificance and, more importantly, by the 
lack of in vivo functional information of 
the majority of human genes. Hence, 
high-throughput strategies to quickly 
assess whether or not a variant of interest 
have functional effects is in high demand. 



Although functional information can be 
obtained using cultured human cells, 
such as iPSCs, these experiments do 
not provide in vivo information. Drosophila 
is an ideal model organism to fill this 
niche, thanks to its short-life cycle, low 
maintenance costs, conserved biology, 
and powerful genetic toolbox. 

Functional annotation of genes is typi- 
cally done one by one, with individual 
laboratories devoting years to study the 
role of one or a few genes in a specific bio- 
logical process or pathway. As most 
genes are also pleiotropic, different labs 
often study the same genes in different 
processes. This level of annotation has 
been the mainstay and the foundation 
of success of Drosophila research. In 
addition to this detailed level of gene 
characterization, cursory but rapid func- 
tion examination of conserved genes in 
Drosophila can also provide important 
data to fill the gap between genetic and 
phenotypic information. 

A cursory functional annotation of 
genes should start with the generation of 
null alleles or strong loss-of-function 
(LOF) mutations since this will provide a 
reference point and a context to study 
the in vivo function of a gene. Once a 
phenotype is identified, integration and 
expression of human cDNA homologous 
to the fly gene can be tested for its 
rescuing ability. An example of a simple 
strategy is shown in Figure 1. Integration 
of the yeast GAL4 transcription factor 
with a ribosome skipping peptide (2A) in 
a gene of interest will create a severe 
LOF allele (Diao et al., 2015). Upon identi- 
fication of the phenotype in the fly, rescue 
experiments by the UAS-human cDNA 
transgene that is expressed in the proper 
spatial and temporal domain permit 
testing the conservation of gene func- 
tion between fly and human. Comparing 
the rescue efficiency of human cDNAs 
with reference (wild-type) versus variant 
(mutant) sequences is a rapid method of 
assessing whether a particular variant 
found in a human patient might be 
affecting the normal function of this 
gene. Finally, overexpression of reference 
and variant human cDNA sequences in 
wild-type flies can also lead to detection 
of dominant phenotypes associated with 
variants found in human patients. 

Another key step in the functional 
annotation of genes is to determine the 
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Figure 1. Functional Annotation of Conserved Genes using Drosophila 

Rapid functional annotation of conserved genes is possible in Drosophila by combining a number of 
technologies and resources. First, the potential fly ortholog of a human gene of interest is identified. An 
insertion of an artificial exon that functions as a gene trap and allows expression of GAL4 (Trojan exon 
cassette (Diao et al., 2015)) can be introduced in an intron between two coding exons via Recombination 
Mediated Cassette Exchange (RMCE) of available MiMIC [I Minos Mediated Integration Cassette) in- 
sertions (Venken et al., 2011). Alternatively, this can be achieved via Homology Directed Repair (HDR) 
using CRISPR. This Trojan exon consist of splice acceptor (SA) followed by a ribosomal skipping peptide 
(2A), the GAL4 gene, and a polyadenylation (polyA) sequence, allowing the expression of GAL4 in the 
pattern of the gene of interest in loss-of-function (LOF) mutants. By crossing these lines with flies that carry 
a transgene of the human cDNA under the control of UAS (DNA sequence recognized by GAL4), it can be 
determined if a human cDNA is able to rescue the fly mutant phenotype. If rescue is achieved with the wild- 
type (reference sequence) protein, one can further assess the function of variants found in human patients. 
UAS-human cDNA lines can also be used to assess dominant phenotypes (antimorphic, hypermorphic, or 
neomorphic) by overexpressing the human gene in a wild-type fly. MiMIC or Trojan gene-traps can be 
converted into protein-traps via RMCE, allowing intronic tagging of the gene of interest. GFP-tagged 
genes/proteins can be further knocked down using strategies to degrade the transcript (iGFPi) or protein 
(deGradFP) in a conditional and tissue specific manner (Nagarkar-Jaiswal et al., 2015), providing stage 
and tissue specific gene function information. 



temporal, cellular, and subcellular distri- 
bution of the protein of interest. The 
simplest strategy is to tag genes in 
genomic constructs (plasmids, fosmids, 
or BAC clones), generate transgenic 
strains, and monitor the tag (e.g., GFP) 
in vivo. Alternatively, the above men- 
tioned GAL4 cassette can be modified to 
be replaced with an artificial exon that 



contains a protein tag (Venken et al., 
2011). These tagged proteins are ex- 
pressed under the control of endogenous 
regulatory elements, allowing documen- 
tation of protein expression patterns and 
subcellular localization without overex- 
pression. Although the tag is internal to 
the protein, 75% of the proteins tagged 
with GFP tested so far have been shown 



to be functional in vivo (Nagarkar-Jaiswal 
et al., 2015). In summary, by combining 
genomic technologies, one should be 
able to quickly assess the LOF pheno- 
types and expression pattern of a yet 
uncharacterized gene, identify the human 
ortholog, and assess the function of hu- 
man variants. 

Morgan may have been modest about 
the impact of Drosophila research in 
human physiology and medicine but the 
long-term impact is obvious: he selected 
a cost-effective model organism that has 
provided countless insights into biology, 
many of which have been directly appli- 
cable to human biology and medicine. 
Going forward, Drosophila has the poten- 
tial to keep on making great contributions, 
and the era of functional genomics is no 
exception. 
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Regardless of where you find yourself at 
the moment while reading these lines, 
visually inspect your surroundings and 
you will see signs of life. I ask you now 
to, for a moment, imagine planet Earth 
devoid of all plant and animal life. For all 
the desolation such an image will conjure 
in your mind, such an Earth would still be 
teeming with life— life unseen, microbial 
life. Yet, attempt to imagine an Earth 
without microbes and then, indeed, you 
will have to confront a lifeless planet. 
Microbial activities throughout much of 
our planet’s history yielded the setting 
propitious for the evolution of plants and 
animals, the setting that we constantly 
appreciate in our daily lives. This wonder- 
ful and awe-inspiring universe of the 
microbes, unseen creatures that have 
shaped the planet such that we may live 
in it, is engagingly presented by Paul G. 
Falkowski in a remarkable text entitled 
Life’s Engines: How Microbes Made Earth 
Habitable. 

What is it that makes this text remark- 
able? In the opening pages, the author 
pointedly notes two major deficiencies 
of many past and present textbooks of 
biology. First, microbes have been largely 
forgotten. “...I realized that books on 
biology, which were assigned to me in 
college, mostly ignored microbes, except 
as carriers of disease...”. Second, and 
perhaps much more importantly, in their 
effort of to be comprehensive, such text- 
books end up being intensely soporific, 
efficient cures for insomnia. “The biology 
texts I was required to read were not 
only inaccessible, they were downright 
boring. I couldn’t understand how one 
could take such an exciting subject, the 
study of life, and turn it into something 
so filled with irrelevant jargon.” In bringing 
these two problems of textbooks to the 
fore early on, the author is— intentionally 
or not— setting himself up to the challenge 
of coming up with something radically 
different. Without a doubt, he has suc- 
ceeded in doing that. I, for one, fully agree 



with his assessment that the most widely 
used textbooks of biology are indeed 
quite boring. Over the years, I have stead- 
fastly refused to use textbooks in my 
teaching, aside from placing a copy or 
two on “reserve shelves,” where they 
tend to gather dust. But Life’s Engines is 
different, and I most likely will make this 
book required reading for my courses in 
the future. Not since I read the text that 
would propel me into my lifelong study 
of microbes (Gunther Stent’s Molecular 
Genetics, an Introductory Narrative) in 
the early 1 970s have I been so completely 
taken by a textbook of biology. 

The book’s success is based on its utter 
simplicity. It tells the story of the history of 
life on our planet from a very personal 
perspective. Coming from the heart, as it 
is, the story is not constrained by order, 
though there is certainly a nice progres- 
sion in the way the story unfolds. Thus, 
you will be transported from a vessel 
exploring microbial life in the depths of 
the Black Sea to Istanbul, admiring 
Turkish rugs. As merchants “unroll” rugs 
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depicting biblical stories, you will be 
reminded that “evolution” means “to 
unroll.” But, understanding the origin of 
life on Earth will not come from stories 
depicted on those rugs. Rather, the 
perspective of scientific study of the 
evolution of microbes will be needed. 
And so your voyage will lead you through 
Darwin’s fascination with fossils and the 
age of the Earth, and you will accompany 
Darwin on the Beagle and through his 
musings on the origin of life. And suddenly 
you will encounter Miller and Urey making 
amino acids from ammonia, methane, 
hydrogen, and a spark. All of this beauti- 
fully sets the stage for the simple fact 
that the unifying concept in biology, 
evolution, developed for over a century 
without including microbes that had 
been on Earth for billions of years before 
the first animal evolved. And that is just 
the first chapter, making the reader yearn 
to “meet the microbes and see how they 
played an outsized role in making this 
planet function. Without microbes, we 
would not be here.” 

Because of their size and their invisi- 
bility, microbes— the oldest residents 
of Earth — are relative newcomers as 
subjects of study. Following the same 
narrative style that introduced the fact 
that microbes were largely absent from 
the early development of evolutionary 
thought, Falkowski comprehensively re- 
lates the key roles that microbes have 
played in Earth’s history. In this fashion, 
we explore early lens crafters and micro- 
scopes and launch into the history of 
microbiology. From simple observation 
to cultivation to molecular analyses, we 
are swept along three centuries to the rev- 
olutionary change in worldview brought 
about by Carl Woese’s approach to 
re-drawing the “Tree of Life.” From this 
universal phylogeny, it is easy to visualize 
the dominance of microbes in the diver- 
sity of life and to pose the question 
of when life originated. And so, you will 
take a fresh look at the composition of 
rocks and go back 3.5 billion years to 
the beginning of life, of life microbial. 
You will then be ready to explore the 
function of those Life’s Engines cells 
from an evolutionary perspective. And 
we all know that “nothing in biology 
makes sense except in the light of evolu- 
tion.” Even if you feel that you know all 
there is to be known about cell biology, 
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I assure you that you will find amazing 
surprises in this gem of a book— not the 
least of which, I think, is the story of 
the history of the processes that led 
to the oxygenation of our atmosphere. 
Take a deep breath, appreciate the oxy- 
gen, and thank the microbes. 

Here’s a suggestion. I was so enthralled 
by this book from the get-go that I invite 



you have a short taste of it. These days, 
it is easy to access some pages of almost 
any new book online. I know from having 
done this myself that the Prologue to 
this book is readily available. I suggest 
you take a look inside and savor this sec- 
tion. In it, you will discover a young boy 
who became fascinated with guppies 
and green algae “all because of a chance 



encounter by my nosey, loquacious 
mother with a couple of graduate stu- 
dents in an elevator.” This boy grew up 
driven by a curiosity about the world 
around him and has now decided to share 
his life experience with all of us lucky 
enough to get our hands on this book. 
After reading the Prologue, I am pretty 
sure you will want to read the whole thing. 

Roberto Kolter >* 

"'Department of Microbiology, Harvard 
Medical School, 77 Avenue Louis Pasteur, 
Boston, MA 02115, USA 
Correspondence: roberto_kolter@hms. 

harvard.edu 
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The most prevalent form of cystic fibrosis arises from an 
amino acid deletion in the cystic fibrosis transmembrane 
conductance regulator, CFTR. A recently approved 
treatment for individuals homozygous for this mutation 
combines a chemical corrector, which helps CFTR fold, 
and a potentiator that increases CFTR channel activity. 



NAME 

Orkambi, a combination of VX-809 (Lumacaftor) and VX-770 (Ivacaftor) 

APPROVED FOR 

Cystic fibrosis (CF) in patients older than 1 2 with two copies of the 
AF508 CFTR gene 

TYPE 

Small-molecules 

MOLECULAR TARGETS 

CFTR, an anion channel in the ATP binding cassette transporter family 

CELLULAR TARGETS 

Various epithelial tissues in which CFTR regulates chloride, bicarbonate, 
and fluid secretion 

EFFECTS ON TARGETS 

Lumacaftor corrects mutant CFTR folding, and Ivacaftor potentiates 
CFTR channel activity. Restored CFTR trafficking and activity counters 
the fluid secretion defects in pancreas, intestine, sweat glands, and lung, 
where it improves airway surface liquid formation and productive mucus 
and microbe clearance. 

DEVELOPED BY 

Vertex Pharmaceuticals and Cystic Fibrosis Foundation Therapeutics 



Lumacaftor rescues 

98 30 % 

from degradation 

Ivacaftor 

W 2- fo|d 

increase in 
channel activity 



Individuals with cystic fibrosis 




have benefited 
from Ivacaftor 



5% 

r An/ may benefit 
O V/ /O from Orkambi 



1989 

Cloning of the CFTR gene 

1996 

Chemical chaperones rescue AF508 
trafficking to the cell surface 

1990 

AF508 CFTR is prematurely 
degraded and fails to traffic 
to the cell membrane 



1990 



1995 



2000 



2012 



Ivacaftor approved for patients 
with a channel gating mutation 



2011 

VX-809 (Lumacaftor) facilitates 
AF508 mutant protein folding 



2009 

VX-770 (Ivacaftor) augments 
CFTR channel opening 



2012 

Lumacaftor 
lowers sweat 
chloride 
concentration 



2015 

Orkambi (Ivacaftor/ 
Lumacaftor) approved 
by the FDA for AF508 
homozygous patients 



2005 



2010 



2015 



References for further reading are available with this article online: www.cell.com/cell/abstract/S0092-8674(15)01123-X 
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The Next Top Models 



Extreme Farming 




Jian-Kang Zhu 

Shanghai Center for Plant Stress Biology 

With the advent of global warming and 
climate change, more frequent episodes 
of extreme weather are predicted. That 
will dramatically affect food security. In 
addition, there will soon be more people, 
more meat diets, and less land for crops. 
Yet, food production needs to be doubled 
to feed the growing population by 2050. 
Big improvements in crop productivity un- 
der harsh environments will be required. 

Model plants like Arabidopsis and rice 
have already paved the way to many ge- 
netic improvements in crops. Unfortu- 
nately, these model plants have not 
evolved to thrive in extreme environ- 
ments. Some Arabidopsis relatives, spe- 
cifically Thellungiella sp., can instead 
serve as model systems for understand- 
ing how plants cope with high salinity, 
extreme cold, and water shortage. Be- 
sides being capable of resisting extreme 
environments, these small plants (with 
their genomes sequenced) have short 
life cycles, produce large numbers of 
seeds, and can be genetically trans- 
formed easily in large numbers. Studies 
on their salt tolerance revealed that these 
plants appear to take advantage of 
pre-primed stress response genes and 
pathways present but less effective in gly- 
cophytes, such as Arabidopsis and rice. 
Future work on these extremophile plants 
promises to elucidate strategies plants 
use to adapt to extreme environments 
and to improve crops to better cope with 
future harsh weather. 



Axolotl Legwork 




Jessica Whited 

Harvard Medical School 



Axolotl salamanders (Ambystoma mexi- 
canum) can regenerate limbs throughout 
life following amputation. Their limbs are 
remarkably anatomically similar to human 
limbs, containing the full repertoire of 
tissues and patterned in similar ways. 
A vast amount of experimentation in the 
field of amphibian limb regeneration was 
conducted by some of the leading biolo- 
gists of the time in previous centuries. 
These older experiments at a pre-molecu- 
lar level are a goldmine for modern 
studies. From these perspectives, axolotl 
is an ideal model for understanding natu- 
ral regeneration. 

Axolotls form a transient niche-like 
structure called “blastema” to localize 
and multiply activated progenitor cells. 
We are approaching the questions of 
blastema formation and function with no 
a priori assumptions about the genes 
that drive these processes as the axolotl 
genome is enormous (-32 Gb) and 
largely unsequenced. Taking an a priori 
approach has only become feasible in 
the last several years with the advent 
of next-gen sequencing and the develop- 
ment of powerful new technologies for 
functional experimentation, such as trans- 
genesis, viral transduction, and gene edit- 
ing. Thus, this is the perfect time to be 
using axolotls to elucidate the mecha- 
nisms underlying the fascinating process 
of regeneration. The hope is that, in the 
future, this information will help in formu- 
lating hypotheses about why humans 
and other mammals have much more 
restricted natural regenerative abilities 
and eventually taking effective ap- 
proaches to circumvent these limitations. 



The Naked Truth 




Andrei Seluanov and Vera Gorbunova 

University of Rochester 

Naked mole rat (Heterocephalus glaber) is 
a mouse-size rodent, but it lives ten times 
longer than mouse and is resistant to mul- 
tiple age-related diseases, most notably 
cancer. Traditionally, molecular biologists 
focused their work on short-lived organ- 
isms, such as mice and rats, which 
reproduce and die rapidly, making them 
convenient genetic models. Although 
major aspects of genetics and physiology 
are conserved among mammals, short- 
lived species lack adaptations that confer 
long life. Since the ultimate goal of 
biomedical research is to prevent disease 
and extend lifespan, investigating mecha- 
nisms that confer longevity and disease 
resistance in long-lived species has 
tremendous potential. The discovery of 
high-molecular-mass hyaluronan that 
provides cancer resistance to naked 
mole rats exemplifies how studying a 
nonstandard species leads to a clinically 
relevant molecule. 

Switching to nonstandard models may 
seem challenging, but it is also extremely 
rewarding as there is so much novel 
biology to be unearthed. The tools from 
whole-genome sequencing to RNAi and 
CRISPR/Cas9 technologies to study 
these atypical organisms are rapidly 
improving. The naked mole rat is just 
one example of a mammal that evolution- 
ary adapted to a long and healthy life. 
Many other long-lived species that have 
evolved unique mechanisms to stall aging 
and prevent disease such as beaver, gray 
squirrel, blind mole rat, Brandt’s bat, and 
bowhead whale await future investigation. 
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Ancient Immunity 




Masanori Kasahara 

Hokkaido University 



Jawed vertebrates ranging from sharks to 
humans mount adaptive immune re- 
sponses using antigen receptors of the 
immunoglobulin superfamily. In contrast, 
the antigen receptors known as variable 
lymphocyte receptors (VLRs) in jawless 
vertebrates such as lampreys and hagfish 
generate diversity comparable to that of 
T cell and B cell receptors by assembling 
multiple leucine-rich repeat modules. This 
highlights the striking difference between 
the adaptive immune systems of jawed 
and jawless vertebrates and catapults 
lampreys and hagfish into the limelight in 
the immunology field. Interestingly, like 
jawed vertebrates, lampreys have three 
lineages of lymphocytes: one lineage of 
B-like lymphocytes and two lineages of 
T-like lymphocytes resembling a(3 and 
y5 T cells, respectively. Therefore, it ap- 
pears that specialized lymphocyte line- 
ages emerged in a common vertebrate 
ancestor and that jawed and jawless ver- 
tebrates evolutionarily co-opted different 
antigen receptors within the context of 
such lymphocyte lineages. 

Many questions remain unanswered. 
Do lampreys have antigen-presenting 
molecules with functions equivalent to 
those of major histocompatibility complex 
molecules? What is the chemical nature of 
ligands recognized by VLRs expressed on 
ap T -like lymphocytes? Do they recognize 
peptides like their gnathostome counter- 
parts? How do lamprey lymphocytes un- 
dergo selection during their development 
in central lymphoid organs? Investigation 
of the immune system of these ancient 
jawless fishes will yield many more sur- 
prises and keep inspiring us for years 
to come. 



Hive’s Logic of Life 




Gro V. Amdam 

Arizona State University and Norwegian University 
of Life Sciences 

Honey bees (Apis mellifera) provide 
remarkable opportunities for understand- 
ing complex behavior, with systems of di- 
vision of labor, communication, decision 
making, and social aging/immunity. They 
teach us how social behaviors develop 
from solitary behavioral modules, with 
only minor “tweaking” of molecular net- 
works. They help us unravel the funda- 
mental properties of learning, memory, 
and symbolic language. They reveal the 
dynamics of collective decision making 
and how social plasticity can change 
epigenetic brain programming or reverse 
brain aging. They show us the mecha- 
nistic basis of trans-generational immune 
priming in invertebrates, perhaps facili- 
tating the first vaccines for insects. 

These processes and more can be 
studied across the levels of biological 
complexity— from genes to societies and 
over multiple timescales— from action po- 
tential to evolutionary. As models in 
neuroscience and animal behavior, honey 
bees have batteries of established 
research tools for brain/behavioral pat- 
terns, sensory perception, and cognition. 
Genome sequence, molecular tools, and 
a number of functional genomic tools are 
also available. With a relatively large-sized 
body (~1.5 cm) and brain (~1 mm 3 ), this 
fascinating animal is, additionally, easy 
to work with for students of all ages. 

Beekeeping practices date as early as 
the Minoan Civilization, where the bee 
symbolized a Mother Goddess. Today, 
we increasingly value honey bees as 
essential pollinators of commercial crops 
and for their ecosystem services. Honey 
bees have been called keepers of the 
logic of life. They are truly. 



Natural Neuroscience 




Nachum Ulanovsky 

Weizmann Institute of Science 



Through a reductionist approach, we 
made great progress in brain research 
by focusing on simple sensory stimuli 
and simplified, highly controlled labora- 
tory behaviors. However, this reduc- 
tionism came at a cost: it neglected 
what real brains evolved for— guiding 
behavior in real-world, complex natural 
environments. We know surprisingly little 
about natural behaviors of mice and 
rats— a fundamental gap in our under- 
standing of brain and behavior in these 
“standard mammalian models.” It is 
therefore crucial to study the neural basis 
of behavior under real-life, naturalistic 
conditions: a “Natural Neuroscience” 
approach. 

One way would be to study wild rodents 
outdoors and then implement more natu- 
ralistic experiments indoors. Another way 
is to use “atypical” mammalian models, 
such as bats. Why bats? First, much is 
known about bats’ natural behaviors in 
the field; in fact, the same species (and 
even same individuals) can be studied 
both outdoors and in the lab. Second, 
their 3D flight behaviors and long-dis- 
tance navigational skills make bats excel- 
lent models for studying the neural basis 
of navigation. Third, bat sensory inputs 
(biosonar) can be recorded with sub- 
millisecond precision, making them great 
models for “active sensing.” Finally, 
studying bats allows a “sanity check” of 
a key hidden assumption in modern 
biology— that all mammals are alike, 
from mouse to human. Comparative 
studies of bat and rodent brains— e.g., 
the role of hippocampal oscillations— 
have begun to argue against that assump- 
tion. This highlights the importance of 
studying circuits, networks, and neural 
codes across species. 
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Monkey Mind: A Memoir of Aging 




Guoping Feng 

MIT 



The lack of successful translation of find- 
ings from Alzheimer’s disease rodent 
models to clinical trials has sparked inter- 
est in finding better animal models. 
Although studying large non-human pri- 
mates such as macaques and Caribbean 
vervets has revealed important insights, 
their long life expectancy (up to 30 years 
in captivity) is a major drawback for longi- 
tudinal aging studies. Circumventing this 
problem, the short-lived gray mouse 
lemur (I Microcebus murinus ) has been 
developed as an alterative primate model 
for studying cerebral aging and Alz- 
heimer’s disease. Aged mouse lemurs 
(>6 years) show many human aging con- 
ditions, including cataracts, loss of olfac- 
tory acuity, reduced fine motor skills, 
and cognitive deficits. In addition, about 
5%-10% of aged mouse lemurs develop 
abnormal behaviors indicative of “patho- 
logic aging” such as aggressiveness, 
loss of social contact, and loss of 
biorhythm. Intriguingly, other pathological 
alterations, including massive brain atro- 
phy, loss of cholinergic neurons, p-amy- 
loid accumulation, and Tau aggregation, 
are similar to those associated with pa- 
tients of Alzheimer’s disease. 

Features of mouse lemurs, such as 
small size (60-120 g), short lifespan 
(8-12 years in captivity), high fecundity 
(2-4 offspring per year), and early sexual 
maturity (10 months), make them more 
advantageous and suitable for genetic 
manipulations using CRISPR genome- 
editing technology. There are several lab- 
oratory-raised colonies around the world 
that would facilitate the expansion of this 
model for aging research. 



A Fish in the Fountain of Youth 




Anne Brunet 

Stanford University 



The African turquoise killifish (Nothobran- 
chius furzeri) is a fascinating organism. It 
lives in Zimbabwe and Mozambique in 
ponds that are only present during the 
brief rainy season and has evolved a natu- 
rally compressed lifespan adapted to this 
unusual habitat. The African killifish is an 
attractive new model to study aging and 
age-dependent diseases in vertebrates. 
Aging studies have long benefited from 
invertebrate models, such as worms 
and flies. But those organisms lack organs 
or systems, including bones, blood, 
and adaptive immune system, that are 
involved in age-related diseases. On the 
other hand, vertebrate models— mice 
and zebrafish — are limited by their longer 
life (2.5 and 4-5 years, respectively). 

In developing new model organisms, it 
is important to consider what unique 
aspect they bring. The African killifish pro- 
vides a natural short lifespan (4-6 months 
in optimal laboratory conditions) and 
recapitulation of age-dependent pheno- 
types and pathologies, including cogni- 
tive decline, sarcopenia, and cancerous 
lesions. It also replicates certain aspects 
of human biology better than current 
models. For example, its telomeres are 
comparable in size to those of humans. 
These characteristics, coupled with the 
ease of generating many offspring and 
low maintenance costs, make the African 
killifish a promising alternative vertebrate 
model for genetic and drug screening. 
The recent development of a genome- 
editing pipeline in this fish has the poten- 
tial to transform how we explore aging 
and disease-related genes. The African 
killifish could also provide novel insight 
into the differences in lifespan strategies 
between species. 



Vocal Learning 




Daniel Margoliash 

University of Chicago 



Speech and language are central to the 
human experience, commanding exten- 
sive study in the realm of learning and 
memory. Songbirds have emerged as a 
powerful model for analogous animal 
research, informed by and informing the 
work in humans. Research in songbirds 
has identified how individual variation in 
vocal behavior arises from genetic and 
epigenetic factors. This includes not just 
the well-known zebra finch (Taeniopygia 
guttata) but also birds of all stripes, 
with extensive species-level variation in 
learned behavior representing a unique 
opportunity for study in the animal 
kingdom. 

Combining the study of brain and 
learned behavior is also a powerful 
approach for addressing many questions 
of general interest to neurobiology. Song- 
bird research is providing insight into 
systems-level questions such as how 
auditory memories are initially estab- 
lished, are consolidated, and influence 
motor output; computational questions 
such as how motor commands relate to 
movements and how a common time 
frame is maintained given that motor 
command, muscle activation, and sen- 
sory feedback progressively lag in time; 
and in-depth analysis of how network 
properties emerge from the interactions 
of single cells— influenced by a rich 
soup of transmitters and modulators; 
and a great many others. Recently, the 
zebra finch genome has been sequenced. 
Coupled with molecular and genomic 
approaches that are being adopted, this 
is providing for additional elegant experi- 
mental designs possible with songbirds. 
The sky is indeed the limit, and the future 
is melodious for this attractive model 
system. 
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We propose that data mining and network analysis utilizing public databases can identify and quan- 
tify relationships between scientific discoveries and major advances in medicine (cures). Further 
development of such approaches could help to increase public understanding and governmental 
support for life science research and could enhance decision making in the quest for cures. 



Governments and philanthropists provide 
financial support for life sciences primarily 
with the expectation that research will 
lead to cures— defined broadly here as 
measures to prevent, eradicate, or 
ameliorate serious diseases. However, 
public understanding of how scientific 
discoveries actually result in cures is 
limited, and research to elucidate princi- 
ples of biological processes may appear 
to non-scientists as esoteric and irrele- 
vant to public expectations. Recent 
examples of important cures are evident, 
but public support for biomedical 
research as reflected by federal funding 
for the U.S. National Institutes of Health 
has eroded over the past decade (FASEB 
2015), indicating the absence of a strong 
electoral consensus that the life science 
enterprise is meeting public expectations. 
Why is public support for life science 
research wavering at a time when the 
pace of discovery is strong and scientists 
see expanding opportunity, and can ac- 
tions to increase public understanding of 
how new cures are developed lead to 
more sustained and predictable funding 
of life science? 

We propose that data mining and 
network analytics (Nicholson 2006; Nishi- 
kawa and Motter, 2011) applied to what 
we call “cure network informatics” could 
help to increase public appreciation of 
the societal value of life science discov- 
eries. Thoughtful metrics emerging from 
this concept perhaps can be developed 
and molded into forms embraced broadly 
among life scientists and by those 
providing their funding and can be used 
to guide decision making in ways that 
would accelerate progress toward cures. 



Here, we describe a step in this direction 
by means of an analytical model and to- 
pology-based algorithms that quantify re- 
lationships between scientific discoveries 
and cures. 

We established and automated data 
collection and network analysis proto- 
cols utilizing publicly accessible data- 
bases, including www.fda.gov, www. 
clinicaltrials.gov, www.pubmed.gov, and 
www.webofknowledge.com. In a pilot 
study, we considered the recently 
successful applications for regulatory 
approval of two new drugs: ipilimumab in 
oncology and ivacaftor for cystic fibrosis. 
These medical advances are sufficiently 
novel and important to be reasonably 
characterized as “cures” ( vide supra). Ipi- 
limumab is the first successful entry into 
the new and burgeoning field of immuno- 
oncology (Sharma and Allison, 2015) by 
which sustained clinical remissions are 
being induced in patients with previously 
intractable cancers by releasing immune 
effector cells from checkpoint inhibition. 
Ivacaftor corrects the structure of a 
specific loss-of-function mutation in the 
cystic fibrosis transmembrane conduc- 
tance regulator and is the first targeted 
therapy of this heritable disease. Begin- 
ning with the references cited in clinical tri- 
als and information provided to the U.S. 
Federal Drug Administration (FDA) for reg- 
ulatory approval of these drugs (FDA, 
2011; FDA, 2012), we extracted two 
consecutive rounds of retrospective cita- 
tions and constructed network models of 
articles, authors, and institutions contrib- 
uting to the network. Assumptions under- 
lying this approach are: (1 ) that the authors 
of FDA applications and clinical trials will 



appropriately cite publications reporting 
new knowledge critical to the develop- 
ment of a new drug candidate and (2) 
that further retrospective rounds of cita- 
tions will identify previous discoveries 
that were most important in establishing 
the base of knowledge that enabled the 
successful drug development program. 

We learned that the nature of a cure 
discovery citation network is complex 
and fundamentally collaborative with 
respect to the number of different scien- 
tists and institutions making contributions 
to a cure. For example, the citation net- 
work leading to ipilimumab includes 
7,067 different scientists who listed 
5,666 different institutional and depart- 
mental affiliations and includes discov- 
eries spanning 104 years of research 
(Figure 1A). Results for ivacaftor are 
similar: 2,857 different scientists from 
2,516 different institutional and depart- 
mental affiliations, with discoveries span- 
ning 59 years of research (Figure 1 B). 

We next characterized individual scien- 
tists within each citation network by two 
metrics. Propagated in-degree rank (PIR) 
is based on the number and citation count 
of articles that a given author published 
within the citation network and is a mea- 
sure of influence within this selective set 
of publications. Ratio of basic rankings 
(RBR) is based on how selectively a given 
author published within the cure discov- 
ery citation network relative to back- 
ground networks of topically related pub- 
lications similar in size, scope, and 
structure. This ratio helps to normalize 
their overall publication output. 

By applying the metrics of PIR and RBR 
to the entire cure discovery citation 
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Figure 1. Cure Networks: The Constellation of Publications, Scientists, and Institutions Contributing to Drug Discovery 

The red dot at the apex of the cluster is the drug ipimilumab (A) or ivacaftor (B). Relevant clinical trials and the FDA application are illustrated in brown. Publications 
cited in the clinical trials and FDA applications are shown in green. Likewise, papers cited by those publications are also shown in green. Authors of the papers are 
shown in purple, and institutional affiliations listed on the papers are shown in blue. The most influential contributors to the network as assessed by PIR and RBR 
(see Table SI), their articles, and their institutions are highlighted in yellow with red connecting lines. 



network, the most influential and selective 
contributors to these massive networks 
emerge. Thus, in the case of ipilimumab, 
1 5 scientists and 7 institutions associated 
with 433 articles spanning 46 years 
are characterized as elite performers 
(Figure lAand Table SI). Elite performers 
within the ivacaftor network exhibiting 
similar properties as defined by the 
same metrics include 33 scientists and 7 
institutions associated with 355 articles 
spanning 47 years (Figure IB and Table 
SI). These elite performer subnetworks 
are integral to their overall citation net- 
works, serving as hubs for 31% of the 
ipilimumab network and 49% of the iva- 
caftor network. 

These data quantify how the knowledge 
base on which important advances in 
medicine (“cures”) depend includes con- 
tributions from a large and diverse set of 
individual scientists working in many 
locales. This insight should be instructive 
for policy makers by suggesting that future 
cures will depend on broadly based public 
support of life sciences. Narrowly targeted 
funding initiatives may well have value but 
are unlikely in isolation to generate the 
breadth of new knowledge required to 
lay the foundation for future cures. 

We call on the scientific community to 
embrace and advance the concept of 



cure network informatics so as to develop 
advanced and sophisticated analytical 
tools to increase understanding of how 
scientific discoveries lead to cures, 
including predictive metrics that may 
guide decision making with respect to 
work in progress. All of the code neces- 
sary to reproduce and extend this initial 
effort is freely available and open source 
(Lotia and Pico, 2015). This network infor- 
matics approach can be applied to any 
“cure” with a cited publication trail. Cura- 
tors of publically available databases 
could play important roles in these efforts 
by considering cure network informatics 
in the design of database architecture 
and embedded tools. It will be important 
to identify trends that hold across all cures 
and ones that are specific to certain types 
of cures. It will also be useful to identify 
features of hubs within cure networks 
that are essential to the flow of knowledge 
required to create a cure. 

A need for better metrics for assessing 
performance and for decision support 
within the life sciences is widely acknowl- 
edged by leaders and commentators in 
biomedicine (Sarli and Carpenter, 2014; 
University of Gothenburg, 2013). Metrics 
that are readily understandable by non- 
scientists, grounded in outcomes that 
the general public values highly (cures), 



and faithful to what scientists know to be 
the richly intersecting and often unpre- 
dictable nature of scientific discovery 
should be more useful for influencing pol- 
icy makers than currently available alter- 
natives. Further development of new and 
useful tools for cure network informatics 
should contribute to increased public 
trust in, and support for, the life science 
enterprise. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes one table and 
can be found with this article online at http://dx. 
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Prescott et al. take a step forward in studying primate morphological evolution by a cellular anthro- 
pology approach. Through epigenomic profiling of in-vitro-derived cells, the authors identify and 
characterize candidate c/s-regulatory elements underlying divergence in facial morphology be- 
tween human and chimp, shedding new light on what makes us (look) human. 



The difference between human genomic 
sequence and that of the chimpanzee is 
surprisingly small (< 2%), yet we differ 
greatly in appearance from our closest 
living relative. Although one could argue 
that the most discriminating feature is 
the human’s cognitive ability, we all would 
probably distinguish humans from chimps 
in an instance by just a brief glance at 
their faces. It is therefore interesting and 
important to study the genes and DNA 
sequence changes involved in defining 
and creating our faces and how they differ 
between individuals and between chimps 
and humans. 

It was suggested 40 years ago that the 
observed differences in physical appear- 
ance, cognition, and behavior arise from 
changes in genomic regulatory elements 
that control gene expression rather than 
changes in protein-coding regions them- 
selves (King and Wilson, 1975). Today, 
the availability of genomic sequences 
and the advancements in genome-wide 
mapping of candidate regulatory ele- 
ments provide the means to test this 
hypothesis and address the functional 
outcome of the sequence divergence 
(Carroll, 2008). In this issue, Prescott 
et al. shed light on how the distinct human 
and chimp facial characteristics might 
arise through changes in c/s-regulatory el- 
ements by a comprehensive comparative 
epigenomic profiling of an in-vitro-derived 
embryonic cell type: the cranial neural 
crest cells (CNCCs) (Prescott et al., 201 5). 

CNCCs arise during neural tube forma- 
tion in early embryogenesis and migrate 
to the developing head, where they 
differentiate into nerves, bones, cartilage, 
and connective tissue, establishing the 
facial morphology (Santagati and Rijli, 
2003). Qualitative and/or quantitative 
differences in gene expression in these 



cells might thus directly affect the shape 
of the face, contributing to inter- and 
intra-species variation in appearance 
and other CNCC-related human traits. 
Following a strategy of comparative epi- 
genomic profiling applied previously in 
mammalian cell lines and organs (Villar 
et al., 2015) and primate corticogenesis 
(Reilly et al., 2015), Prescott et al. use 
the in-vitro-derived human and chimp 
CNCCs— a tightly matched pair of orthol- 
ogous cell types— and map several 
transcription factors and histone modifi- 
cations genome wide to predict distal- 
acting regulatory elements, also known 
as enhancers (Figure 1). Using estab- 
lished enhancer-associated chromatin 
characteristics, including p300 binding, 
chromatin accessibility, and increased 
PI3K4me1/PI3K4me3 ratio (Shlyueva 
et al., 2014), the authors predict CNCC 
enhancers genome wide. They further 
use H3K27ac as a predictor of enhancer 
activity to assess species-specific bias 
and identify c/s-regulatory elements with 
putative functional divergence. 

To exclude experimental and intra- 
species variability, Prescott et al. use cells 
derived from two chimp individuals and 
three humans. The high similarity between 
human and chimp genomic sequence 
further allows the authors to map reads 
from each species to both reference 
genomes, which circumvents problems 
that might arise during coordinate transla- 
tion based on whole-genome sequence 
alignments but potentially restricts the 
analysis to the more conserved parts of 
the genome. With this approach, Prescott 
et al. predict ~1 4,500 CNCC enhancers, 
of which 13% showed species-biased 
H3K27ac enrichment, about half for hu- 
man or chimp, respectively. The func- 
tional relevance of identified species- 



biased enhancers is supported by the 
observation that nearby genes, which 
are enriched for craniofacial functions, 
are more likely to be differentially ex- 
pressed, with the direction of expression 
change being in agreement with the 
enhancer bias. Indeed, testing nine 
chimp-biased and eight human-biased 
enhancers by luciferase assays suggests 
that >80% of the candidates have spe- 
cies-biased enhancer activity. Together, 
this supports the notion that quantitative 
modulation of enhancer activity is the 
main source of functional divergence 
between closely related species. It will 
be interesting and important to explore 
how many of the almost 2,000 species- 
biased candidates differ in enhancer 
activity (Arnold et al., 2014) and cause dif- 
ferential expression of their target genes. 

To assess the activity of predicted 
enhancers in vivo, the authors tested 
several species-biased CNCC enhancers 
in mouse embryos and showed their 
differential activity in head and face re- 
gions. This further demonstrates that the 
respective cell types exist also in mice 
and that the enhancer divergence is a 
result of sequence changes between hu- 
man and chimp and not the differences 
in trans - regulatory environments of their 
CNCCs. This is in line with the model 
that the evolution of morphological diver- 
sity is driven by c/s-regulatory mutations 
that affect developmental expression 
patterns (Carroll, 2008). Prescott et al. 
further explore sequence features under- 
lying enhancer divergence with a focus 
on transcription factor (TF) binding sites, 
which are central to enhancer function. 
They show that species-biased en- 
hancers harbor very few (three to six) sub- 
stitutions that nevertheless cause dra- 
matic changes in chromatin signatures 
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Figure 1. The Cellular Anthropology Approach 

By deriving cranial neural crest cells (CNCC) for both human and chimp 
in vitro, Prescott et al. gain insight into the gene regulatory processes that 
lead to the establishment of species-specific facial morphology during 
embryogenesis (Prescott et al., 2015). Through epigenomic profiling of 
these cells, the authors identify distal-acting regulatory elements, known as 
enhancers, that show species-biased activity and likely drive species- 
specific CNCC gene expression. The observed epigenomic divergence 
allows them to link sequence changes in c/s-regulatory elements with changes 
in gene expression that underlie differences between the human and 
chimp face. 



associated with active en- 
hancers if they affect binding 
sites of key TFs, emphasizing 
these TFs’ importance for 
gene expression in CNCCs. 

The authors make an impor- 
tant distinction between TF 
motifs that are depleted in 
species-biased enhancers 
likely due to selective pres- 
sure, as they play a central 
role in establishing CNCC 
identity, and those that are 
enriched and thus more likely 
involved in regulating spe- 
cies-specific CNCC gene 
expression. 

Among the later group, a 
novel binding motif stands 
out as highly enriched in 
divergent enhancers and 
shows strong correlation with 
enhancer activity. Interest- 
ingly, this coordinator motif, 
as the authors termed it, is 
present in species-biased en- 
hancers of both human and 
chimp, and the motif strength 
in either species correlates 
with FI3K27ac levels. The 
coordinator motif is therefore 
not species specific but is 
redirected in a species-spe- 
cific fashion to distinct sets 
of enhancers that likely re- 
gulate different subsets of 
genes and drive the diver- 
gence of primate CNCCs 
and their descendants. Considering that 
the coordinator motif is a combination 
of two very prominent binding sites, 
the E-box and homeodomain binding 
motif that are both bound by many 
different TFs, it will be interesting to iden- 
tify the specific TFs that bind to the coor- 
dinator. These TFs likely play a more 
general role in CNCCs and in estab- 
lishing facial morphology and are through 
mutations that modulate their binding 
sites at divergent enhancers deployed 
to regulate species-specific CNCC ex- 
pression patterns. 

There are several evolutionary mecha- 
nisms by which such an extensive 
CNCC enhancer divergence might have 
emerged in such a short evolutionary 
time. Regions of the human genome that 
evolve unexpectedly fast, termed human 



accelerated regions (FIARs), have been 
shown to be involved in regulating hu- 
man-specific traits, including the develop- 
ment of the human brain (Pollard et al., 
2006). Prescott et al. now show that they 
might also be involved in the development 
of human-specific facial morphology. 
Even though only 20 HARs overlap the 
species-biased CNCC enhancers, their 
relative enrichment is significant, and 
additional FIARs might fall into regions 
with larger sequence divergence than 
those considered here (due to the cross- 
mapping requirement). Another important 
aspect of c/s-regulatory evolution are 
transposable elements, whose involve- 
ment in gene regulation has long been 
postulated (Davidson and Britten, 1979), 
and their role in the evolution of regula- 
tory elements has been demonstrated in 



multiple studies since. In line 
with this, Prescott et al. reveal 
significant enrichment of spe- 
cific classes of retrotranspo- 
sons in species-biased en- 
hancers. More than half of 
the divergent enhancers over- 
lap with at least one of 
the major retrotransposon 
families, including endoge- 
nous retroviruses and LI 
elements. As these were pre- 
sent in the primate lineage 
before human and chimp 
separated, they possibly con- 
tain progenitor sequences 
that evolved into CNCC en- 
hancers. Indeed, the authors 
show that LTR9 retroele- 
ments, which are enriched 
at species-biased enhancers, 
often harbor a variant of the 
coordinator motif indepen- 
dently of whether they reside 
in CNCC enhancers or 
elsewhere in the genome. It 
is thus likely that some of 
these elements evolved into 
CNCC enhancers by acquir- 
ing mutations that converted 
a progenitor sequence into 
a strong coordinator motif 
capable of binding key TFs. 
Although this gain of function 
happened in both species, 
the loss of coordinator motif 
function is equally likely to 
have contributed to enhancer 
divergence between human and chimp 
through disruption of the original ancestral 
motif in one of the species. 

Interestingly, the authors show that the 
strongly divergent enhancers tend to 
cluster along the genome according to 
their species bias, forming larger genomic 
regions that often overlap genes critical 
for facial morphogenesis, which are differ- 
entially expressed between human and 
chimp. Some of these genes, for instance 
PAX3, have been previously associated 
with intra-human facial variation and 
were implied in the development of facial 
malformations. It will be interesting in 
the future to link specific regulatory re- 
gions and the genes that they regulate to 
the different morphological and physio- 
logical traits of the human face, which 
given the contribution of CNCCs to 
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nerves, muscles, and throat cartilages 
might even include aspects of verbal 
and non-verbal communication. Taken 
together, the CNCC enhancers identified 
by Prescott et al. represent a compre- 
hensive resource for studies of human 
evolution and the genetic basis of 
variations in facial morphology. Their 
work also introduces the concept of 
cellular anthropology that, by studying 
developmentally relevant cell types 
in vitro, attempts to elucidate mecha- 
nisms underlying morphological evolu- 
tion in primates. Together with the re- 
cent advances in molecular paleontology 
enabling the sequencing of genomes 



of extinct close human relatives, these 
novel approaches make this a very 
exciting time to study human evolution. 
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Zeng et al. reveal that the lipolytic effect of the hormone leptin is mediated by sympathetic nerve 
fibers that directly “envelope” white adipocytes. Local activation of the sympathetic input to the 
fat opens new venues to circumvent central leptin resistance in obesity. 



In the past 20 years, scientists have un- 
covered the mechanisms by which the 
hormone leptin affects different brain 
areas. Although it is also well known that 
central administration of leptin results in 
a myriad of effects at the periphery level 
(Halaas et al., 1995, 1997), the mecha- 
nism of action of these peripheral effects 
at the various tissue sites has remained 
elusive (Balthasar et al., 2004; Cowley 
et al., 2001). In particular, the exact 
mode of signaling by which leptin triggers 
changes in white adipose tissue (WAT) 
function was yet to be identified. In this 
issue of Cell, Zeng et al. (2015) resolve 
the mystery, showing that the sympa- 
thetic nervous system (SNS) is the fine 
effector of leptin’s action on WAT. 

The authors begin by visualizing the 
sympathetic innervation of the inguinal 
WAT. Then, applying state-of-the stimu- 
lating and inhibiting optogenetic ap- 



proaches at peripheral sites, they repro- 
duce or block, respectively, the effects 
of leptin on the WAT (Figure 1 ). Their study 
goes beyond just the delineation of how 
leptin triggers lipolysis in WAT. It puts for- 
ward an exciting experimental design, 
which adapts very advanced and power- 
ful techniques to the study of the sympa- 
thetic innervation of peripheral tissues, in 
this case the WAT. First, they use optical 
projection tomography, or two-photon 
microscopy, to identify the precise sites 
of SNS innervation of WAT in vivo, a feat 
that could not be accomplished before. 
Then they apply optogenetics to stimulate 
axonal projections at the post-ganglionic 
level. To date, optogenetic approaches 
have been predominantly used to interro- 
gate neuronal function in circuits of the 
central nervous system. This paper sets 
the stage for the application of optoge- 
netics to interrogate the functional role 



of various peripheral neuronal circuits in 
system physiology. 

Many efforts have been put forth during 
the last decades to identify the type of 
innervation that is present in the WAT. 
Some studies erroneously postulated the 
existence of parasympathetic innervation 
associated with the vasculature (Gior- 
dano et al., 2006). Conversely, other 
studies investigated the role of the SNS 
innervation of WAT (Bartness et al., 
2014; Diculescu and Stoica, 1970). It has 
been reported that through the manipula- 
tion of sympathetic efferents, it is possible 
to alter lipid mobilization in different fat 
depots (Bartness et al., 2014). Other 
studies have shown SNS innervation of 
WAT through the use of neuroanatomical 
methods, using retrograde tracers and 
immunohistochemical analyses (Bartness 
et al., 2014; Diculescu and Stoica, 1970). 
Nonetheless Zeng et al. are the first to 
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nerves, muscles, and throat cartilages 
might even include aspects of verbal 
and non-verbal communication. Taken 
together, the CNCC enhancers identified 
by Prescott et al. represent a compre- 
hensive resource for studies of human 
evolution and the genetic basis of 
variations in facial morphology. Their 
work also introduces the concept of 
cellular anthropology that, by studying 
developmentally relevant cell types 
in vitro, attempts to elucidate mecha- 
nisms underlying morphological evolu- 
tion in primates. Together with the re- 
cent advances in molecular paleontology 
enabling the sequencing of genomes 



of extinct close human relatives, these 
novel approaches make this a very 
exciting time to study human evolution. 
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fibers that directly “envelope” white adipocytes. Local activation of the sympathetic input to the 
fat opens new venues to circumvent central leptin resistance in obesity. 



In the past 20 years, scientists have un- 
covered the mechanisms by which the 
hormone leptin affects different brain 
areas. Although it is also well known that 
central administration of leptin results in 
a myriad of effects at the periphery level 
(Halaas et al., 1995, 1997), the mecha- 
nism of action of these peripheral effects 
at the various tissue sites has remained 
elusive (Balthasar et al., 2004; Cowley 
et al., 2001). In particular, the exact 
mode of signaling by which leptin triggers 
changes in white adipose tissue (WAT) 
function was yet to be identified. In this 
issue of Cell, Zeng et al. (2015) resolve 
the mystery, showing that the sympa- 
thetic nervous system (SNS) is the fine 
effector of leptin’s action on WAT. 

The authors begin by visualizing the 
sympathetic innervation of the inguinal 
WAT. Then, applying state-of-the stimu- 
lating and inhibiting optogenetic ap- 



proaches at peripheral sites, they repro- 
duce or block, respectively, the effects 
of leptin on the WAT (Figure 1 ). Their study 
goes beyond just the delineation of how 
leptin triggers lipolysis in WAT. It puts for- 
ward an exciting experimental design, 
which adapts very advanced and power- 
ful techniques to the study of the sympa- 
thetic innervation of peripheral tissues, in 
this case the WAT. First, they use optical 
projection tomography, or two-photon 
microscopy, to identify the precise sites 
of SNS innervation of WAT in vivo, a feat 
that could not be accomplished before. 
Then they apply optogenetics to stimulate 
axonal projections at the post-ganglionic 
level. To date, optogenetic approaches 
have been predominantly used to interro- 
gate neuronal function in circuits of the 
central nervous system. This paper sets 
the stage for the application of optoge- 
netics to interrogate the functional role 



of various peripheral neuronal circuits in 
system physiology. 

Many efforts have been put forth during 
the last decades to identify the type of 
innervation that is present in the WAT. 
Some studies erroneously postulated the 
existence of parasympathetic innervation 
associated with the vasculature (Gior- 
dano et al., 2006). Conversely, other 
studies investigated the role of the SNS 
innervation of WAT (Bartness et al., 
2014; Diculescu and Stoica, 1970). It has 
been reported that through the manipula- 
tion of sympathetic efferents, it is possible 
to alter lipid mobilization in different fat 
depots (Bartness et al., 2014). Other 
studies have shown SNS innervation of 
WAT through the use of neuroanatomical 
methods, using retrograde tracers and 
immunohistochemical analyses (Bartness 
et al., 2014; Diculescu and Stoica, 1970). 
Nonetheless Zeng et al. are the first to 
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Figure 1. Post-ganglionic Axonal Projections Innervate Fat Depots 

Stimuli from the different areas of the brain reach fat depots. Zeng et al. provide evidence that the WAT is 
innervated by axonal projections from ganglionic neurons. Optogenetic stimulation and genetic, chemical, 
or physical inhibition of these axons at the post-ganglionic level can mimic or block leptin effects on 
lipolysis in WAT, respectively. 



provide direct evidence for the link be- 
tween central effects of leptin and lipid 
mobilization in the different fat depots. 
The fact that central leptin promotes lipol- 
ysis in fat, together with the evidence 
showing an increase of sympathetic activ- 
ity after leptin treatment (Pellegrino et al., 
2014), led the authors to investigate 
whether the SNS mediates the lipolytic 
effects of leptin. They find that the release 
of sympathetic catecholamines in the 
fat depots plays a mandatory role in the 
lypolytic effects of leptin. These data 
challenge previous hypotheses that the 
lipolysis in WAT induced by leptin is due 
to the effects of other circulating hor- 
mones. Rather the new data show that 
signals from different areas of the brain 
reach the WAT through the SNS. Axonal 
projections form neural-adipose junc- 
tions, where catecholamines (in this case 
norepinephrine) are released to trigger 



lipolysis. Because leptin impacts sys- 
temic metabolism by action in many other 
peripheral tissues, it will be intriguing 
to apply the same approach to other 
systems as well. In the future, improve- 
ment of this technology could allow sel- 
ective altering of impaired functionality 
of peripheral tissues, which may vary 
between patients with metabolic distur- 
bances such as obesity. 

Since the discovery of leptin in 1994 
(Zhang et al., 1994), thousands of studies 
have set out to understand the mecha- 
nisms of action of leptin at the central level 
as well in peripheral tissues. Despite all 
the advances, many questions remain to 
be answered to fully unravel the mecha- 
nism by which leptin controls integrative 
physiology (Balthasar et al., 2004; Cowley 
et al., 2001). Investigators continue to uti- 
lize elaborate and complicated genetic 
models to this end. However, fewer 



studies have been done to dynamically 
identify the signaling modalities of leptin 
in control lipid mobilization and other pe- 
ripheral functions. This paper is an excel- 
lent example for this latter approach, 
looking beyond individual steps of this 
process and focusing on the whole 
body. By default, the implications of 
this study are not limited to leptin and 
lipolysis but rather offer a new approach 
and vista for the investigation of the ac- 
tions of other peripheral hormones in inte- 
grative physiology. 
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Exploiting the dependence of cancer cells on transcription can be used as an effective strategy for 
targeting aggressive and therapeutically recalcitrant tumors. Wang et al. show that inhibiting tran- 
scription using THZ1, a small-molecule inhibitor of cyclin-dependent kinase CDK7, induces 
apoptotic cell death in triple-negative breast cancers. 



Large-scale, deep-sequencing-based 
genomic analyses have revealed a sur- 
prising level of genetic heterogeneity in 
cancers. In some cancers, “driver” muta- 
tions in key oncogenes can be parsed out 
from the background of genetic heteroge- 
neity. Some of these oncogenes (such as 
BRAF and EGFR) encode mutant proteins 
that provide opportunities for rational tar- 
geting by drugs (for instance, targeting 
BRAF V600E by Vemurafenib and Dabra- 
fenib in melanoma; targeting EGFR L858R 
or exon 1 9 deletions by Afatinib and Gefi- 
tinib in non-small-cell lung cancer) (Bollag 
et al., 2012; Sordella et al., 2004). For 
other cancers, such as triple-negative 
breast cancer (TNBC), definitive driver 
mutations have not been identified and 
thus lack targeted therapies. In spite of 
a high level of genetic heterogeneity, 
TNBCs maintain a characteristic and 
readily identifiable pattern of gene 
expression. In this issue of Cell, Wang 
et al. (2015) hypothesize that mainte- 
nance of a uniform gene expression pro- 
gram in TNBCs requires continually active 
transcription, which might make these 
cancers highly sensitive to drugs that 
inhibit transcription (Figure 1 A). 

Recent studies using small-molecule 
inhibitors of the transcriptional machinery 
have shown promising selectivity for 
cancer cells and potent antiproliferative 
effects. For example, targeting bromodo- 
main and extra-terminal proteins (BET) 
family members, such as BRD4, with 
JQ1 has exploited the dependence of 
certain cancers on the transcription of 
critical driver oncogenes (e.g., c-Myc), 
rendering these cancers sensitive to tran- 



scriptional inhibition (Delmore et al., 
2011). More recently, THZ1, a selective 
covalent inhibitor of cyclin-dependent 
kinase CDK7, has been shown to be 
effective in inhibiting the growth of several 
cancers, such as T cell acute lympho- 
blastic leukemia (Kwiatkowski et al., 
2014), MYCN-amplified neuroblastoma 
(Chipumuro et al., 2014), and small-cell 
lung cancer (Christensen et al., 2014). 

Wang et al. (2015) explore the thera- 
peutic potential of targeting CDK7 in 
TNBC, a therapeutically recalcitrant sub- 
type of breast cancer that does not ex- 
press estrogen receptors (ERs, the target 
of the first-line therapeutic Tamoxifen), 
progestin receptors (PRs), or the HER2 re- 
ceptor (ERBB2; a receptor tyrosine-pro- 
tein kinase). Using THZ1 and CRISPR/ 
Cas9-mediated gene editing, the authors 
observe that TNBC cells, but not ER-pos- 
itive/PR-positive breast cancer cells, are 
highly dependent on the transcriptional 
functions of CDK7. Inhibition of CDK7 
with THZ1 promotes apoptotic cell death 
in both TNBC cell lines and patient- 
derived tumor samples. 

CDK7 is a cyclin-dependent kinase 
(CDK) and a subunit of the multi-protein 
basal transcription factor TFIIH. As such, 
it plays dual roles in the regulation of 
cell-cycle progression and transcription. 
As a component of the CDK Activating Ki- 
nase (CAK), CDK7 is involved in control of 
the cell cycle by phosphorylating other 
cell-cycle CDKs, such as CDK1 and 
CDK2 (Malumbres, 2014). As a compo- 
nent of TFIIH, CDK7 regulates transcrip- 
tion initiation by phosphorylating serines 
5 and 7 of the heptapeptide repeat in the 



C-terminal domain (CTD) of the largest 
subunit (RPB1) of RNA polymerase II 
(Pol II) (Malumbres, 2014) (Figure IB). In 
their study, Wang et al. (2015) demon- 
strate that THZ1 -mediated inhibition of 
CDK7 does not alter the cell cycle in 
TNBC cells, suggesting that the sensitivity 
of TNBCs to THZ1 is mediated through 
transcriptional inhibition. 

Following from their initial observations, 
the authors postulate that TNBCs are 
dependent on the uninterrupted tran- 
scription of a key set of genes whose 
expression supports the cancer pheno- 
type. Indeed, they identified a set of 
~450 genes whose expression is highly 
sensitive to inhibition of CDK7 by THZ1, 
which they refer to as an “Achilles cluster” 
of TNBC-specific genes (Figure 1 A). Gene 
ontology analyses revealed that this gene 
set is enriched for factors involved in 
signaling and transcription regulation, 
including genes encoding signaling mole- 
cules and transcription factors with estab- 
lished roles in breast cancer (e.g., TGFB, 
STAT, and l/l/A/7). Interestingly, these 
genes are associated with large clustered 
enhancer regions (so-called “super en- 
hancers”), which are required to drive 
high-level expression of these genes. 
The authors posit that targeting CDK7- 
dependent transcription is an effective 
way to collectively suppress the expres- 
sion of multiple genes that are critical for 
the proliferation of TNBC cells. As such, 
this gene set may have utility as a prog- 
nostic signature for tumors that can be 
treated effectively with THZ1 . 

Recent studies have suggested that 
cancer cells have a higher overall 
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Figure 1. Inhibition of CDK7 by THZ1 and Its Therapeutic Implications 

(A) Cancer cells are characterized by increased genomic heterogeneity compared to normal cells. In some 
cancers, driver mutations in key oncogenes can be targeted therapeutically to inhibit cancer cell growth or 
induce apoptosis. In cancers with no clear driver mutations, such as TNBC, inhibitors such as THZ1 can be 
used to target “transcription addiction” to a set of “Achilles cluster” genes that encode for factors involved 
in signaling and transcription regulation. Many cancer cells can be killed with DNA-damaging agents, 
which wreak havoc on the genome. 

(B) THZ1 covalently binds to and inhibits the activity of CDK7 (a subunit of TFIIH), preventing phosphor- 
ylation of the C-terminal domain (CTD) of the largest subunit (RPB1) of RNA polymerase II (Pol II) and 
inhibiting productive transcription initiation. 

(C) Current models posit that tumor cells have a higher overall transcription output than normal cells, 
allowing for more opportunities to engage oncogenic pathways. General inhibition of transcription can 
have a therapeutic benefit by decreasing the overall transcription output to levels similar to those observed 
normal cells. 

(D) Due to the universal role and biological importance of transcription in all cells, the therapeutic window 
between efficacy and toxicity for malignant and non-malignant cells will determine if THZ1 can be used to 
treat human patients. 



transcriptional output than non-malignant 
cells (Lin et al., 2012). This may increase 
the likelihood of these cancer cells 
to engage in oncogenic pathways 
(Figure 1C). Inhibition of transcription 
may, therefore, reduce the transcriptional 
output of cancer cells to levels that are 
less likely to feed into oncogenic path- 
ways (Figure 1C). However, due to the 
universal role and biological importance 
of transcription in all cells, targeting tran- 
scription as a therapeutic strategy may 
be challenging due to the potential lack 
of selectivity for cancer cells over normal 



cells. Therefore, it is imperative to deter- 
mine if the therapeutic window between 
efficacy and toxicity for malignant and 
non-malignant cells is large enough to 
produce a therapeutically efficacious ef- 
fect (Figure 1 D). 

This situation is analogous, in some 
respects, to the use of DNA-damaging 
therapeutics. All cells need to maintain a 
certain level of genome integrity to sur- 
vive, but highly proliferative cells, such 
as cancer cells, are more sensitive the 
effects of DNA-damaging drugs. In fact, 
this is the basis of synthetic lethality with 



PARP inhibitors in BRCA1/2-depleted 
cancers. While current chemotherapeutic 
agents used to induce DNA damage 
exploit the requirement for cancer cells 
to frequently replicate their genomes, a 
similar dependence on transcription can 
now be exploited with transcription inhib- 
itors, such as THZ1 . 

In spite of recent advances in genomic 
sequencing, clear driver mutations have 
yet to be discovered for TNBCs. While 
ER-positive/PR-positive breast cancers 
are effectively treated with hormone ther- 
apies, the more aggressive TNBCs lack 
targeted therapies, and cytotoxic chemo- 
therapy remains the standard treatment 
(Mayer et al., 2014). Thus, THZ1 and its 
derivatives are promising candidates for 
the treatment of TNBCs. Wang et al. 
(2015) have developed an analog of 
THZ1 (THZ2) with improved pharmacoki- 
netics that has few side effects in mouse 
xenograft models, which may be more 
useful clinically. The growth-promoting 
pathways that are activated in cancer 
cells involve multiple redundancies. 
Therefore, the use of a targeted therapeu- 
tic agent that selectively inhibits one 
pathway may be undermined by the 
activation of a compensatory pathway 
(Mayer et al., 2014). The strategy of tar- 
geting transcription more generally may 
be effective in these cases. Moreover, 
combining targeted agents with transcrip- 
tional inhibitors may be an effective 
approach for the treatment of TNBC, 
which may minimize therapeutic resis- 
tance of these difficult-to-treat cancers. 
If THZ1 -based therapy can be effectively 
translated into the clinic, the identification 
of biomarkers, perhaps an enhancer 
signature, that can predict if a given tumor 
will be sensitive to CDK7 inhibition will be 
essential. 

In sum, this study highlights the enor- 
mous potential for targeting transcrip- 
tional addiction in aggressive tumors. 
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Retroviral restriction is a complex phenomenon that, despite remarkable recent progress, is far 
from being well understood. In this Preview, we introduce an insightful study by Yang et al. that 
represents the first attempt to identify the global determinants of retroviral repression in pluripotent 
mammalian cells. 



To protect their genomic integrity, animals 
control retroviral infections by establish- 
ing heritable epigenetic silencing of the 
integrated provirus in early embryonic 
development. In mouse embryonic stem 
cells (ESCs), KAP1 (Trim28) is targeted 
to newly integrated Moloney murine leu- 
kemia virus (MMLV) by the Krueppel- 
associated box (KRAB) zinc finger protein 
ZFP809. KAP1 , in turn, recruits histone- 
modifying enzymes, including the histone 
methyl transferase SETDB1 (ESET), that 
deposit repressive histone 3 lysine 9 
trimethylation (H3K9me3) marks at the 
provirus (Figure 1) (Matsui et al., 2010; 
Rowe et al., 2010; Wolf and Goff, 2009). 
The KRAB/KAP1 system also represses 
endogenous retroviruses (ERVs), which 
are potentially hazardous remnants of 
retroviral germline infections (Matsui 
et al., 2010; Rowe et al., 2010; Wolf 
et al., 2015). Additionally, several cofac- 
tors of the KRAB/KAP1 system, but also 
KAP1 -independent retroviral repression 
pathways, have been identified over the 
last few years. Indeed, the abundance 
and sequence diversity of exogenous 
and endogenous retroviruses likely drove 



evolution of complex and partially redun- 
dant repression mechanisms that keep 
these elements under control. Moreover, 
some ERVs have been adapted as 
new regulatory elements and, in some 
cases, have re-wired entire transcriptional 
networks (Macfarlan et al., 2012). Retro- 
viral repression mechanisms might 
therefore also regulate transcription of 
cellular genes. Despite recent progress 
in the field, deciphering the complexity 
and interconnectivity of retroviral repres- 
sion pathways and networks remains 
an outstanding problem of mammalian 
genome biology. The Resource article 
by Yang et al. (2015) performs a 
genome-wide small interfering RNA 
(siRNA) knockdown screen in a first 
attempt to determine in a global manner 
the components of retroviral repression 
machinery in mammalian pluripotent 
cells. 

The siRNA screen was performed using 
a MMLV reporter that is repressed by 
ZFP809/KAP1 and is therefore primarily 
aimed at identifying cofactors acting up- 
and downstream of the KRAB/KAP1 sys- 
tem, but also at potentially overlapping 



KAP1 -independent repression pathways. 
Apart from previously known factors, 
including ZFP809, KAP1 , and SETDB1 , 
hundreds of new repression candidates 
were identified. As expected, many candi- 
dates are associated with chromatin 
modification, DNA methylation, and regu- 
lation of transcription. Additionally, the 
screen identified genes involved in protein 
sumoylation, DNA repair, and DNA repli- 
cation and even factors located outside 
of the nucleus (e.g., plasma membrane, 
cytoskeletal, and organelle proteins). 
These findings highlight the complexity 
of retroviral restriction networks in 
mammalian cells, although many of these 
factors may not primarily, specifically, 
and/or directly repress retroviruses. 
Without a doubt, the provided candidate 
list is a potentially valuable resource for 
future studies that may address how 
these factors mediate retroviral restriction 
and ultimately help us to better under- 
stand how epigenetic silencing of retro- 
viruses is established, maintained, and 
inherited during development. 

Two of the newly identified repression 
mechanisms are subsequently analyzed 



30 Cell 163 , September 24, 2015 ©2015 Elsevier Inc. 



CrossMark 




Cell 



Chipumuro, E., Marco, E., Christensen, C.L., 
Kwiatkowski, N., Zhang, T., Hatheway, C.M., 
Abraham, B.J., Sharma, B., Yeung, C., Altabef, 
A., et al. (2014). Cell 159, 1126-1139. 

Christensen, C.L., Kwiatkowski, N., Abraham, B.J., 
Carretero, J., Al-Shahrour, F., Zhang, T., Chipu- 
muro, E., Herter-Sprie, G.S., Akbay, E.A., Altabef, 
A., et al. (2014). Cancer Cell 26, 909-922. 

Delmore, J.E., Issa, G.C., Lemieux, M.E., Rahl, 
P.B., Shi, J., Jacobs, H.M., Kastritis, E., Gilpatrick, 



T„ Paranal, R.M., Qi, J., et al. (2011). Cell 146, 
904-917. 

Kwiatkowski, N., Zhang, T., Rahl, P.B., Abraham, 
B.J., Reddy, J., Ficarro, S.B., Dastur, A., Amzallag, 
A., Ramaswamy, S., Tesar, B., etal. (2014). Nature 
511, 616-620. 

Lin, C.Y., Loven, J., Rahl, P.B., Paranal, R.M., 
Burge, C.B., Bradner, J.E., Lee, T.I., and Young, 
R.A. (2012). Cell 151, 56-67. 

Malumbres, M. (2014). Genome Biol. 15, 122. 



Mayer, I.A., Abramson, V.G., Lehmann, B.D., 
and Pietenpol, J.A. (2014). Clin. Cancer Res. 
20, 782-790. 

Sordella, R., Bell, D.W., Flaber, D.A., and Settle- 
man, J. (2004). Science 305, 1163-1167. 

Wang, Y., Zhang, T., Kwiatkowski, N., Abraham, 
B.J., Lee, T.I., Xie, S., Yuzugullu, H., Von, T„ Li, 
H„ Lin, Z., et al. (2015). Cell 163, this issue, 174- 
186. 



Revealing the Complexity of Retroviral Repression 

Gernot Wolf and Todd S. Macfarlan * 

1 The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health, Bethesda, 

MD 20892, USA 

Correspondence: todd.macfarlan@nih.gov 
http://dx.doi.Org/1 0.101 6/j. cell. 201 5.09.01 4 

Retroviral restriction is a complex phenomenon that, despite remarkable recent progress, is far 
from being well understood. In this Preview, we introduce an insightful study by Yang et al. that 
represents the first attempt to identify the global determinants of retroviral repression in pluripotent 
mammalian cells. 



To protect their genomic integrity, animals 
control retroviral infections by establish- 
ing heritable epigenetic silencing of the 
integrated provirus in early embryonic 
development. In mouse embryonic stem 
cells (ESCs), KAP1 (Trim28) is targeted 
to newly integrated Moloney murine leu- 
kemia virus (MMLV) by the Krueppel- 
associated box (KRAB) zinc finger protein 
ZFP809. KAP1 , in turn, recruits histone- 
modifying enzymes, including the histone 
methyl transferase SETDB1 (ESET), that 
deposit repressive histone 3 lysine 9 
trimethylation (H3K9me3) marks at the 
provirus (Figure 1) (Matsui et al., 2010; 
Rowe et al., 2010; Wolf and Goff, 2009). 
The KRAB/KAP1 system also represses 
endogenous retroviruses (ERVs), which 
are potentially hazardous remnants of 
retroviral germline infections (Matsui 
et al., 2010; Rowe et al., 2010; Wolf 
et al., 2015). Additionally, several cofac- 
tors of the KRAB/KAP1 system, but also 
KAP1 -independent retroviral repression 
pathways, have been identified over the 
last few years. Indeed, the abundance 
and sequence diversity of exogenous 
and endogenous retroviruses likely drove 



evolution of complex and partially redun- 
dant repression mechanisms that keep 
these elements under control. Moreover, 
some ERVs have been adapted as 
new regulatory elements and, in some 
cases, have re-wired entire transcriptional 
networks (Macfarlan et al., 2012). Retro- 
viral repression mechanisms might 
therefore also regulate transcription of 
cellular genes. Despite recent progress 
in the field, deciphering the complexity 
and interconnectivity of retroviral repres- 
sion pathways and networks remains 
an outstanding problem of mammalian 
genome biology. The Resource article 
by Yang et al. (2015) performs a 
genome-wide small interfering RNA 
(siRNA) knockdown screen in a first 
attempt to determine in a global manner 
the components of retroviral repression 
machinery in mammalian pluripotent 
cells. 

The siRNA screen was performed using 
a MMLV reporter that is repressed by 
ZFP809/KAP1 and is therefore primarily 
aimed at identifying cofactors acting up- 
and downstream of the KRAB/KAP1 sys- 
tem, but also at potentially overlapping 



KAP1 -independent repression pathways. 
Apart from previously known factors, 
including ZFP809, KAP1 , and SETDB1 , 
hundreds of new repression candidates 
were identified. As expected, many candi- 
dates are associated with chromatin 
modification, DNA methylation, and regu- 
lation of transcription. Additionally, the 
screen identified genes involved in protein 
sumoylation, DNA repair, and DNA repli- 
cation and even factors located outside 
of the nucleus (e.g., plasma membrane, 
cytoskeletal, and organelle proteins). 
These findings highlight the complexity 
of retroviral restriction networks in 
mammalian cells, although many of these 
factors may not primarily, specifically, 
and/or directly repress retroviruses. 
Without a doubt, the provided candidate 
list is a potentially valuable resource for 
future studies that may address how 
these factors mediate retroviral restriction 
and ultimately help us to better under- 
stand how epigenetic silencing of retro- 
viruses is established, maintained, and 
inherited during development. 

Two of the newly identified repression 
mechanisms are subsequently analyzed 
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Figure 1. SUM02/CAF-1 -Assisted Retroviral Silencing 

Simplified models of SUM02/CAF-1 -assisted retroviral silencing are depicted. (Top) The MMLV provirus 
and some endogenous retroviruses are targeted by various KRAB zinc finger proteins (KRAB-ZFPs) that 
recruit the KAP1 corepressor. KAP1, among other functions, recruits SETDB1, which deposits the 
repressive H3K9me3 mark at histone H3. HP1, a reader of the H3K9me3 mark, may recruit CAF-1. 
Alternatively, SETDB1 , which immunoprecipitates with CAF-1 , might be involved in CAF-1 recruitment 
to the proviral DNA. In concert with other factors, CAF-1 delivers a H3-H4 dimer onto the retroviral 
DNA during replication to maintain the repressive mark. (Bottom) MERVL elements are not repressed 
by the KRAB/KAP1/SETDB1 complex and lack the H3K9me3 mark. Instead, MERVL ERVs are repressed 
by the H3K4 lysine demethylase KDM1 A and histone deacetylases, such as HDAC2, which both remove 
histone modifications associated with open chromatin and transcription. CAF-1 knockdown results 
in epigenetic changes and retroviral de-repression, but also to the activation of MERVL-regulated 
2C genes. This ultimately facilitates the transition of ESCs to a more epigenetically pliable state, similar 
to 2C embryos. 





in detail to validate the significance of the 
screen: KAP1 sumoylation and chromatin 
assembly at proviruses. Yang et al. show 
that KAP1 sumoylation by SUM02 is 
required for KAP1 recruitment to the 
MMLV provirus and ERVs and thus for 
epigenetic silencing of these elements. 
This supports previous findings that su- 
moylation of KAP1 by SUM02 
is essential for forming a stable KRAB/ 
KAP1/SETDB1 repression complex (Iva- 
nov et al., 2007). CHAF1 A, one of the top 
hits in the screen, is the core component 
of the chromatin assembly factor-1 
(CAF-1). CAF-1 depletion impairs repres- 
sion of newly integrated proviruses and 
also promotes reactivation of several 
ERV families, many of which are bound 
by both CAF-1 and KAP1 (Yang et al., 
2015). Although this implies that CAF-1 
is a component of the KRAB/KAP1 
silencing system, SUM02 knockdown, 
which results in KAP1 loss at the MMLV 
provirus, does not disrupt CAF-1 binding 



(Yang et al., 2015). This indirectly indi- 
cates that CAF-1 recruitment to retro- 
viruses is independent of KAP1 binding. 
The question remains: how is CAF-1 
targeted to retroviral elements? One 
possibility is that CAF-1 re-assembles 
histones at repressed retroviral elements 
after DNA replication and thus aids to 
maintain heterochromatin marks, as pre- 
viously suggested (Yu et al., 2015). In 
this model, free histone H3, mono-meth- 
ylated by SETDB1, is incorporated into 
newly synthesized heterochromatic DNA 
by CAF-1 and is further methylated to 
form stable heterochromatin on the newly 
synthesized strand via heterochromatin 
protein 1 (HP1), which binds to the 
H3K9me3 mark (Yu et al., 2015). CAF-1 
immunoprecipitates with both SETDB1 
and HP1, possibly explaining its localiza- 
tion at retroviral elements (Figure 1) 
(Yang et al., 201 5; Yu et al., 201 5). Howev- 
er, it has yet to be determined whether 
CAF-1 localization at KAP1 -controlled 



ERVs is indeed replication dependent 
and whether the chromatin assembly 
function and/or the PCNA-HP1 interact- 
ing function of CAF-1 is required for ERV 
silencing. Furthermore, it remains open 
whether CAF-1 localizes exclusively at 
ERV-associated heterochromatin or also 
at non-viral genes that are repressed by 
KRAB/KAP1 , for example, at imprinted 
genes. Nevertheless, the findings pro- 
vided by Yang et al. strongly support a 
role for CAF-1 in the establishment and/ 
or maintenance of heterochromatin at 
ERVs after DNA replication. 

Interestingly, CAF-1 is also recruited 
to ERVs that are not bound by KAP1, 
SETDB1 , or H3K9me3— especially class 
III ERVs, which consist primarily of 
MERVL elements (Figure 1). These ele- 
ments are among the ERVs with the high- 
est reactivation levels in CAF-1 -depleted 
cells (Yang et al., 2015), an observation 
supported by a recent report (Ishiuchi 
et al., 2015). This indicates that CAF-1 
may target and repress different ERV 
classes by entirely different mechanisms. 
Previously, it has been shown that MERVL 
repression by CAF-1 requires the chro- 
matin-assembly activity of CAF-1, but 
not its functional interaction with HP1 
and PCNA (Ishiuchi et al., 2015). More- 
over, growth arrest of CAF-1 knockdown 
ESCs at G1 -S prevented MERVL reactiva- 
tion, indicating that CAF-1 acts to repress 
MERVL elements during or after DNA 
replication (Ishiuchi et al., 2015). Yang 
et al. also show that KDM1A (LSD1), 
which physically interacts with CAF-1, 
is strongly enriched at CAF-1 -bound 
MERVL elements (Yang et al., 2015). 
KDM1A represses MERVL elements in 
ESCs (Macfarlan et al., 2011), but 
KDM1A binding at MERVL ERVs has not 
been previously demonstrated. However, 
it remains open how CAF-1 and KDM1A 
are targeted to these elements. 

Depletion of KDM1A in ESCs leads 
to de-repression of MERVL transcripts 
and MERVL-associated genes, but also 
to an increased number of spontane- 
ously arising cells resembling two-cell- 
stage embryos (2C-like cells) within 
the ESC population (Macfarlan et al., 
2012). Interestingly, CAF-1 knockdown 
ESCs exhibited similar phenotypes 
(Ishiuchi et al., 2015). Moreover, the 
nuclei of 2C-like cells originating from 
CAF-1 knockdown ESCs are also 
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shown to lack chromocenters and are 
more efficiently reprogrammed by nu- 
clear transfer into enucleated oocytes 
(Ishiuchi et al., 2015), supporting the 
important link between CAF-1/KDM1A- 
mediated retroviral repression and 
cellular epigenetic potential in early 
development. 

Altogether, Yang et al. provide a 
valuable source for retroviral repression 
candidates using a genome-wide 
siRNA knockdown screen. Importantly, 
several of the newly identified factors 
are confirmed to function in pathways 
that have not been directly associated 
with retroviral repression before. This 
validation strongly supports that the 
screen identified bona fide candidates, 
whose further investigation will not 
only deepen our understanding of the 
complex retroviral restriction networks, 



but also reveal new regulatory mecha- 
nisms in retrovirus-derived transcrip- 
tional networks. 
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Environmental adaptation, predisposition to common diseases, and, potentially, speciation may all 
be linked through the adaptive potential of mitochondrial DNA (mtDNA) alterations of bioenergetics. 
This Perspective synthesizes evidence that human mtDNA variants may be adaptive or deleterious 
depending on environmental context and proposes that the accrual of mtDNA variation could 
contribute to animal speciation via adaptation to marginal environments. 



The mitochondrial DNA (mtDNA) genes of different human pop- 
ulations encompass polymorphisms that alter amino acids, 
which appear invariant in diverse animal species. Given the func- 
tional importance of the 13 mtDNA oxidative phosphorylation 
(OXPHOS) genes, it would be expected that purifying selection 
would ensure that the functionally important amino acids would 
be conserved across species and thus should be invariant 
among individuals within the same species. Yet this is not the 
case. Why? 

mtDNA Variation and the History of Women 

The maternal inheritance of the human mtDNA and its high mu- 
tation rate has resulted in the sequential accumulation of mtDNA 
genetic variants along radiating maternal lineages. The resulting 
mtDNA mutational tree encompasses clusters of related mtDNA 
haplotypes, known as haplogroups, which arose in geographi- 
cally localized indigenous populations. Hence, the human 
mtDNA phylogeny and the geographic distribution of associated 
indigenous populations have permitted the reconstruction of the 
origins and ancient migrations of women (Figure 1). 

The mtDNA tree is rooted in Africa about 1 30,000 and 1 70,000 
years before present (YBP). For the first ~1 00,000 years, 
mtDNAs radiated within Africa, generating a plethora of Afri- 
can-specific mtDNA haplogroups (L0, 1,2,3, etc.) that, in aggre- 
gate, are referred to as macrohaplogroup L. Between 45,000 and 
65,000 YBP, two mtDNAs, M and N, emerged from within L3 in 
northeast Africa and successfully left Africa, founding macroha- 
plogroups M and N, which colonized the rest of the world. 
Macrohaplogroup N gave rise to multiple European, Asian, and 
Native American mtDNA lineages, while macrohaplogroup M 
gave rise to only Asian and Native American haplogroups. 

The migration of women out of Africa and around the world 
was associated with four striking regional mtDNA discontinuities. 
First, only M and N mtDNAs colonized Eurasia and the Americas. 
Second, while N haplogroups dispersed throughout Europe and 
Asia, M haplogroups were confined to Asia. Third, of all of the 
Asian M and N mtDNA lineages, only haplogroups A, C, and D 
became enriched in Northeast Siberia and were poised at 
around 20,000 YBP to cross the Bering Land Bridge into the 



Americas. Finally, only haplogroup B mtDNAs colonized the 
Pacific Islands. Discovery of these striking mtDNA haplogroup 
regional discontinuities has led to the hypothesis that specific 
mtDNA haplogroups may have been functionally constrained 
by regional environmental selection (Cann et al., 1987; Denaro 
et al., 1981; Kivisild et al., 2006; Merriwether et al., 1991; Mish- 
mar et al., 2003; Wallace, 2005, 2013a, 2013b). 

Mitochondrial Genetics and Bioenergetics 

The mtDNA codes for the most important polypeptides of the 
mitochondrial energy generating system OXPHOS: the ND1, 2, 
3, 4, 4L, 5, and 6 genes of complexes I; the cytochrome b 
gene of complex III; the COI, COII, and COM genes of complex 
IV; and the ATP6 and ATP8 genes of complex V. In addition, 
the mtDNA codes for the 22 tRNAs and two rRNAs for mitochon- 
drial protein synthesis plus an ~1,000 nucleotide “control 
region” that regulates mtDNA transcription and replication (Wal- 
lace et al., 2013). 

Mitochondrial OXPHOS generates much of cellular energy 
by the oxidation of dietary calories with oxygen. As electrons 
pass down the electron transport chain (ETC) through com- 
plexes I, III, and IV to reduce oxygen, the energy released is 
used to pump protons out across the mitochondrial inner 
membrane to generate a proton electrochemical gradient. 
This electrochemical gradient can be employed by the ATP 
synthase (complex V) to drive ATP synthesis. However, mito- 
chondria OXPHOS also modulates cellular REDOX and reac- 
tive oxygen species (ROS) production, pH and Ca 2+ levels, 
apoptotic initiation, and, via tricarboxylic cycle intermediates, 
signal transduction pathways and the epigenome (Picard 
et al., 2014; Wallace, 2005; Wallace and Fan, 2010; Wallace 
et al., 2010, 2013). 

The critical role played by the mtDNA genes in OXPHOS 
means that the mtDNA polypeptide genes should be highly 
evolutionarily conserved. Yet the mtDNA has a very high 
sequence evolution rate. Since most functional mtDNA muta- 
tions would be deleterious, the high mutation rate should create 
a high genetic load and imperil the survival of the species (Wal- 
lace, 2013a). This conundrum is resolved by the unique 
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Figure 1. Regional Radiation of Human mtDNAs from their Origin in Africa and Colonization of Eurasia and the Americas Implies that 
Environmental Selection Constrained Regional mtDNA Variation 

All African mtDNAs are subsumed under macrohaplogroup L and coalesce to a single origin about 130,000-170,000 YBP. African haplogroup L0 is the most 
ancient mtDNA lineage found in the Koi-San peoples, LI and L2 in Pygmy populations. The M and N mtDNA lineages emerged from Sub-Saharan African L3 in 
northeastern Africa, and only derivatives of M and N mtDNAs successfully left Africa, giving rise to macrohaplogroups M and N. N haplogroups radiated into 
European and Asian indigenous populations, while M haplogroups were confined to Asia. Haplogroups A, C, and D became enriched in northeastern Siberia and 
were positioned to migrate across the Bering Land Bridge 20,000 YBP to found Native Americans. Additional Eurasian migrations brought to the Americas 
haplogroups B and X. Finally, haplogroup B colonized the Pacific Islands. Figure reproduced from (MITOMAP, 2015). 



intracellular mtDNA population genetics of the female germline 

(Wallace and Chalkia, 2013). 

Maternally inherited mtDNA mutations arise among the hun- 
dreds to thousands of mtDNAs within the female germline cells, 
each new mutation creating a mixture of normal and mutant 
mtDNAs, a state known as heteroplasmy. As a heteroplasmic 
mitotic or meiotic cell divides, the mutant and normal mtDNAs 
undergo replicative segregation, becoming randomly distributed 
among the daughter cells. The mammalian occyte contains 
several hundred thousand mtDNAs, which do not actively repli- 
cate after fertilization until the blastocyst stage. Hence, the re- 
sulting primordial germ cells contain only a couple of hundred 
mtDNAs. Subsequent mtDNA replication in the derived oogonia 
leads to proto-oocytes with re-expanded mtDNA populations of 
several thousand mtDNAs. This repeated contraction and 
expansion of the intracellular mtDNA populations causes rapid 
genetic drift of heteroplasmic mtDNAs generating proto-oocytes 
enriched for either the mutant or normal mtDNAs (Wallace and 
Chalkia, 2013). 

The proto-oocytes and/or oocytes with the most severe 
mtDNA mutations are then selectively eliminated prior to or 
soon after fertilization. This is possible because, unlike anatom- 
ical alterations that require developmental elaboration of struc- 
tures before they can be acted on by selection, mitochondrial 
physiological alterations are expressed at the single-cell level. 
Hence, cells with highly deleterious mtDNA mutations and asso- 



ciated bioenergetic perturbations can be detected and elimi- 
nated within the ovary. This permits the mtDNA to have a high 
mutation rate without the species acquiring excessive genetic 
load (Fan et al., 2008; Sharpley et al., 2012; Stewart et al., 

2008). Through this system, bioenergetic variation is continu- 
ously introduced into the population, thus providing a powerful 
tool for animal adaption to changing environments. 

Regional mtDNA Variation and Functional 
Consequences 

The central role of the mtDNA genes in OXPHOS and of OXPHOS 
in cellular physiology means that functional variants in the 
mtDNA can have profound effects on human biology. For 
example, the efficiency with which the ETC generates the proton 
gradient and by which the proton gradient is converted into ATP 
is referred to as the coupling efficiency, and humans can differ in 
their coupling efficiency due to mtDNA polymorphisms. Since a 
dietary calorie is a unit of heat, every calorie burned by the mito- 
chondrion generates one calorie of body heat. Tightly coupled 
mitochondria generate the maximum ATP and minimum heat 
per calorie burned and thus could be beneficial in warmer cli- 
mates, while loosely coupled mitochondria must burn more cal- 
ories for the same amount of ATP, generating more heat, and 
could be of benefit in colder climates. Variation in OXPHOS 
can also affect ROS production, which affects cell growth, 
signaling, inflammation, and predilection to infection; Ca 2+ 
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Figure 2. Hypothetized Role of mtDNA Vari- 
ation in Animal Environmental Adaptation 
and Speciation 

This figure portrays the environmental space (niche) 
of successively evolving species (green, orange, 
and blue horizontal bands). The left-to-right 
expanse represents the range of ecological zones 
for each species, with the center (M) being the 
optimal environment and the two left and right ver- 
tical dashed lines (SD1 and SD2) representing 
increasingly marginal environments. Successive 
mtDNA mutations occurring over time are repre- 
sented by branch points on black lines, with most of 
them being neutral. As a new species expands from 
its optimal niche into more marginal environments, 
occasional mtDNA mutations arise, which are 
physiologically beneficial in the suboptimal envi- 
ronment (blue circles at branch points). These line- 
ages become enriched by adaptive selection with 
additional neutral and adaptive mutations accu- 
mulating, creating a haplogroup. The same envi- 
ronmental constraint can select for the same 
mutation on different mtDNA lineages. Occasion- 
ally, one mtDNA lineage located at the extreme edge 
of the species’ niche (left and right edges) permits a 
subpopulation to persist long enough for nDNA 
variants to arise that permit switching of food source 
(energy reservoir), leading to speciation (open circle 
crossing species boundaries). Previously adaptive 
mtDNA variants now become suboptimal in the new 
niche and revertants are selected, permitting ener- 
getic re-adaptation back to M. 



levels, which regulate cellular and organ homeostasis; and high- 
energy intermediate levels that can regulate the epigenome. 

Consistent with the proposed importance of mtDNA variation 
in human adaptation, regional haplogroups are generally 
founded by one or more functionally significant polypeptide, 
tRNA, rRNA, and/or control region variants. These variants are 
retained in the descendant mtDNAs creating the haplogroups. 
For example, at the macrohaplogroup level, the out-of-Africa 
macrohaplogroup N was founded by two amino acid variants: 
ND3 nucleotide (nt) 10389G>A (All 47} and ATP6 nt 8701 G>A 
(A59T). These variants alter mitochondrial membrane potential 
and Ca 2+ regulation (Kazuno et al., 2006), potentially changing 
the coupling efficiency and being advantageous in colder cli- 
mates. The European macrohaplogroup N-derived haplogroup 
J was founded by the reversion of the N-defining ND3 
10389G>A variant and the acquisition of a new ND5 13708G>A 
(A458T) variant. Haplogroup J radiation gave rise to subha- 
plogroup Jlc founded by a cytochrome b variant at 14798T>C 
(F18L) and subhaplogroup J2 with a cytochrome b variant at 
15257G>A (D171N). European haplogroup U was founded by 
the tRNA Leu(CUN) 1 2308A>G variant and gave rise to subha- 
plogroup Uk, which encompasses the ATP6 9055G>A (A177T) 
variant and an independent recurrence of the cytochrome b 
14798T>C (F18L) variant (Ruiz-Pesini et al., 2004; Ruiz-Pesini 
and Wallace, 2006). These haplogroup-founding polypeptide 
variants change amino acids that otherwise show high interspe- 
cific evolutionary conservation — in some cases, even to bacte- 
ria. Yet these and multiple other variants of highly conserved 
amino acids have been retained in the human population in the 
face of purifying selection for tens of thousands of years, 
recurred multiple times, and have become enriched in regional 



populations to generate regional haplogroups (Kivisild et al., 
2006; Mishmar et al., 2003; Ruiz-Pesini et al., 2004, 2007; 
Ruiz-Pesini and Wallace, 2006; van Oven and Kayser, 2009). 

That haplogroups have physiological consequences is sug- 
gested by haplogroups T and U, which are associated with 
reduced sperm motility (Montiel-Sosa et al., 2006; Ruiz-Pesini 

et al., 2000); haplogroups J and Uk being enriched in Finnish 
sprinters and haplogroup I in distance runners; and haplogroup 
L0 being enriched in Kenyan elite distance runners (Table SI). 
Moreover, climatic differences correlate with mtDNA rather 
than nDNA variation (Balloux et al., 2009), and the basal meta- 
bolic rate of Siberian populations that are enriched for hap- 
logroups A, C, and D is higher than that of more southern popu- 
lations (Leonard et al., 2002; Snodgrass et al., 2005, 2008). 

A more direct demonstration of the adaptive importance of 
mtDNA variants comes from studies on the mtDNA ND1 nt 
3394T>C (Y30H) variant. In high-altitude Tibetans, the rare 
3394C allele is greatly enriched over low altitude Asians (OR 
~24), arose three independent times on macrohaplogroup M 
mtDNAs, and increases in frequency with the altitude of Tibetan 
villages; an analogous variant (ND1 Y30C) having been found in 
the high-altitude Ethiopian monkey, Theropithecus gelada. This 
suggests that the 3394C allele is adaptive at high altitudes 
when it arises on M haplogroups. However, the 3394C allele 
has not been observed in Tibetan N haplogroups, suggesting 
that it may be deleterious when it arises on macrohaplogroup 
N mtDNAs (Ji et al.,2012). 

To determine the physiological consequences of the 3394C 
variant in association with various mtDNA haplogroups, the 
mtDNAs of interest have been established in cultured cell lines 
by transmitochondrial cybrid production (Trounce et al., 1996). 
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Transfer of 3394T>C (Y30H) mtDNAs into an osteosarcoma nu- 
clear environment revealed that the ND1 3394C allele on macro- 
haplogroup N haplogroup B or F mtDNAs reduced complex I ac- 
tivity between 7% and 28%. However, the complex-l-specific 
activity between haplogroups B and F harboring the 3394T allele 
differed by 30%, a greater difference than seen for either hap- 
logroups B or F when comparing the 3394T versus C allele. 
Moreover, when the 3394C variant occurred on the macroha- 
plogroup M background, as in Tibetan haplogroup M9, the com- 
plex-l-specific activity was as high as that of the most active 
macrohaplogroup N haplogroup B mtDNA with the 3394T allele 
(Ji et al., 2012). Hence, both individual mtDNA single-nucleotide 
polymorphisms, as well as the haplogroup background, interact 
to modulate mitochondrial bioenergetics. 

Functional differences have been observed between other 
haplogroups with the osteosarcoma cybrids. Comparison of H 
versus J cybrids revealed that J mtDNA cells have reduced 
mtDNA, mtDNA transcripts, mitochondrial translation products, 
oxygen consumption, membrane potential, and ATP levels (Go- 
mez-Duran et al., 2012). Haplogroup H cybrids differ from Uk cy- 
brids by the Uk cybrids having lower mtDNA, mitochondrial RNA, 
and mitochondrial protein synthesis levels; reduced complex IV 
activity; increased oxygen consumption; and reduced inner 
membrane potential, suggesting reduced coupling efficiency 
(Gomez-Duran et al., 2010). The control region variant, 
295C>T, is associated with increased TFAM transcription factor 
binding to the L-strand promoter, increased L-strand transcripts, 
and increased mtDNA copy number (Suissa et al., 2009). 

Comparison of haplogroup H and J mtDNAs on a retinal 
pigment epithelial (RPE) nuclear background revealed that J 
mtDNA cells have reduced ATP, ROS, and reactive nitrogen spe- 
cies levels; increased lactate and growth rate; reduced expres- 
sion of macular degeneration gene CFH; altered expression of 
genes involved in cell signaling, inflammation, and metabolism; 
and altered UV exposure response (Kenney et al., 2014a; Malik 
et al., 2014). Comparison of European H versus African L 
mtDNAs in RPE cells showed that the L mtDNA cells had lower 
ATP turnover rates; reduced spare respiratory capacity; reduced 
mtDNA copy number; increased mtDNA mRNA levels; and 
altered expression of nuclear complement, inflammation, and 
autoimmunity genes (Kenney et al., 2014b). Transfer of mouse 
mtDNAs from one inbred nucleus to another or mixing of two 
normal mtDNAs within the mouse germline resulted in significant 
phenotypic differences (Fischer Lindahl et al., 1991 ; Roubertoux 
et al., 2003; Sharpley et al., 2012), an effect also seen in 
Drosophila (Meiklejohn et al., 2013; Zhu et al., 2014). Therefore, 
naturally occurring mtDNA variation can have profound effects 
on cellular physiology, growth characteristics, and inflammatory 
systems. 

While the population substructure of mtDNA variation can 
result from genetic drift (Cann et al., 1987; Kivisild et al., 2006; 

Wallace, 2013a), in this Perspective, I am exploring the hypoth- 
esis that a portion of mtDNA sequence variation, particularly 
among the haplogroup-founding functional mtDNA variants, 
has been acted on by adaptive selection. This is because these 
mtDNA variants fulfill all of the criteria currently used to argue for 
positive selection acting on protein-coding genes (Nielsen et al., 
2007). They change evolutionarily conserved amino acids; they 



have recurred multiple times throughout human radiation; they 
are associated with expansion of a rare haplotypes into regional 
polymorphic haplogroups; they lead to geographically con- 
strained population haplogroups; they increase in frequency in 
cases in which the environmental challenge is apparent 
(e.g., altitude); and they change physiological phenotypes, 
cellular functions, and nuclear gene expression profiles of direct 
relevance to regional environmental challenges (Nielsen et al., 
2007). 

mtDNA Variation in Disease 

The importance of mtDNA variation is demonstrated by the wide 
range of common clinical phenotypes that have been associated 
with mtDNA haplogroups. The penetrance of the milder Leber 
hereditary optic neuropathy (LHON) complex I gene mtDNA mu- 
tations is increased if the LHON mutation arose on haplogroup J 
or a A/D 7 3394C-bearing mtDNA (Ji et al., 2012; Sadun et al., 
2011; Wallace et al., 1988). In fact, mtDNA haplogroups have 
been associated with a wide range of metabolic, degenerative, 
infectious, and autoimmune diseases, selected examples of 
which are listed in Table S2. 

Specific mtDNA haplogroups have also been associated with 
predisposition to various cancers (Table S2). Additionally, cancer 
cells can acquire de novo mtDNA mutations within the control re- 
gion and the tRNA, rRNA, and protein-coding genes, a subset of 
which may be the same or similar to variants associated with 
regional haplogroups (Brandon et al., 2006). 

One mechanism by which mtDNA variation can have such 
profound effects on cellular and organismal phenotypes is 
through retrograde signaling to the nucleus. Patients heteroplas- 
mic for the mtDNA tRNA Leu(UUR) nt 3243A>G mutation harboring 
20%-30% mutant (3243G) mtDNAs can present with diabetes, 
50%-90% of mutant mitochondria with neuromuscular degen- 
erative disease, and ~100% with lethal perinatal disease. Rela- 
tive to osteosarcoma cybrids with 0% mutant mtDNAs, physio- 
logical and molecular analysis of 20%-30% mutant cybrids 
revealed reduced OXPHOS without glycolytic compensation; 
50%-90% mutant cybrids showed strong glycolytic gene induc- 
tion with declining OXPHOS; and 100% mutant cybrids experi- 
enced severe reductions in both glycolysis and OXPHOS. These 
marked changes in patient phenotypes associated with mtDNA 
genotypes correlate with four dramatic phase changes in tran- 
scriptional patterns corresponding to 0%, 20%-30%, 50%- 
90%, and 100% 3243 mutant. Thus, the continuous changes 
in the mtDNA genotype must signal to the nucleus through the 
cellular signal transduction pathways and epigenome to regu- 
late gene expression. However, the nucleus appears to only 
be able to respond in four finite ways, thus creating the abrupt 
phase changes in gene expression and phenotype (Picard 
et al., 2014). 

mtDNA Variation and Speciation 

The discoveries that the female germline generates a high fre- 
quency of mild functional mtDNA variants, that functionally 
important OXPHOS gene variants have arisen repeatedly within 
the mtDNA phylogeny throughout human history, and that 
selected variants become regionally enriched has led to the hy- 
pothesis that the mtDNA provides a powerful adaptive engine 
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for mammals to cope with environmental change (Figure 2). As 
a corollary to this hypothesis, the rapid elaboration of adaptive 
mtDNA variants could permit subpopulations of a species to 
survive and prosper in “marginal” environments, becoming 
progressively isolated from parent populations. These meta- 
stable peripheral populations could then, in theory, have suffi- 
cient longevity for the much slower accumulation of adaptive 
nDNA gene mutations in both bioenergetic (Gershoni et al., 
2010, 2014; Mishmar et al., 2006) and structural genes (Nielsen 
et al., 2007; Sabeti et al., 2007). Ultimately, the accumulated 
mtDNA and nDNA adaptive variants could alter a subpopula- 
tion’s physiology and anatomy sufficiently to permit a switch 
to a new primary food source (energy resource), resulting in a 
new niche and thus speciation. 

How then could the conservation of mitochondrial DNA 
sequence be explained? With the acquisition of a more abundant 
energy resource, many of the selective pressures that originally 
drove the enrichment of regional mtDNA variants would be 
relieved for the new species. The high mtDNA sequence evolu- 
tion rate plus adaptive selection would then favor in the new spe- 
cies the reversion of previously adaptive but now maladaptive 
variants back to more commonly optimal bioenergetics alleles. 
By sequencing only mtDNAs from the central populations of 
different species, only the common optimal allele would be 
observed, thus giving the false impression of the invariance of 
the amino acid at that site. 

Conclusion 

The process of mtDNA adaptive radiation, combined nDNA- 
mtDNA coevolution to speciation, and reversion of intraspecific 
adaptive mtDNA mutations can explain several seemingly anom- 
alous facts. These include why mutations in apparently highly 
conserved OXPHOS amino acids can occur multiple times within 
a species and repeatedly increase to polymorphic frequencies; 
why mtDNA phylogenies coalesce with the origins of species 
(Cann et al., 1987; Merriwether et al., 1991; Mishmar et al., 
2003) while nDNA variants such as HLA alleles or the Tibetan 
Denisovan EPAS1 allele (Huerta-Sanchez et al., 2014) are re- 
tained across related species; and why a mtDNA variant can 
be advantageous in one environmental context and deleterious 
in another. This later phenomenon may be relevant to the rise 
of common disease phenotypes such as diabetes, obesity, 
and neurodegenerative disease, as globalization of regional 
diets encompasses non-regional mtDNA haplogroups or as 
migration transfers regional mtDNA haplotypes to new environ- 
ments, thus converting an adaptive mtDNA genotype into a mal- 
adaptive one. Thus, the unique features of the mtDNA may 
require a reassessment of some of our core assumptions about 
human genetics and evolutionary theory. 
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Significant advances have been made in developing novel therapeutics for cancer treatment, and 
targeted therapies have revolutionized the treatment of some cancers. Despite the promise, only 
about five percent of new cancer drugs are approved, and most fail due to lack of efficacy. The 
indication is that current preclinical methods are limited in predicting successful outcomes. Such 
failure exacts enormous cost, both financial and in the quality of human life. This Primer explores 
the current status, promise, and challenges of preclinical evaluation in advanced mouse cancer 
models and briefly addresses emerging models for early-stage preclinical development. 



Explosion of Cancer Therapies and Challenges to 
Clinical Success 

Ever-increasing knowledge of cancer biology has yielded count- 
less possibilities for diagnostic and therapeutic strategies 
(Figure 1), while at the same time revealing enormous disease 
complexities that challenge clinical success. Such challenges 
include tumor microenvironment complexities, intra- and inter- 
tumor molecular and biological heterogeneity, systemic and 
tumoral immune and metabolic response heterogeneity, and 
the ability of drug-resistant stem-like cancer-initiating cells to 
repopulate treated cancers (Pattabiraman and Weinberg, 
2014). Too often, experimental targeted therapies designed to 
assimilate known disease complexity have proven ineffective, 
only to highlight the limitations in our understanding. In contrast 
to most experimental targeted therapies, encouraging advance- 
ments have been made using a number of cell-based and 
targeted immunotherapies, which have produced sustained re- 
sponses in patients (Page et al., 2014). However, only a fraction 
of patients respond to these therapies. 

Over the last decade, cancer classification has shifted from 
relying solely on histiopathologic properties to including key 
molecular attributes that can predict therapeutic outcomes. 
That certain molecular aberrations are targets for effective ther- 
apy first led to clinical practice in 1995 after a leukemia (APL) 
bearing the PML-RARa translocation was shown to be sensitive 
to retinoic acid (tretinoin) (Quignon et al., 1 997), which targets the 
RARa component to effect leukemic cell differentiation. Soon 
thereafter, Herceptin (a Her2 inhibitor) was approved for treating 
Her2+ breast cancer (1998), and Gleevec (a BCR-ABL inhibitor) 
was approved for CML treatment (2001). These highly effective 
drugs rapidly became the standard of care. Although these suc- 
cesses establish the promise of targeted therapies, most at- 
tempts to attain similar results targeting known molecular drivers 
have failed, and the reasons are often elusive because of human 
research limitations. Some general principles have been recog- 
nized that emphasize the need for preclinical platforms approx- 



imating human cancers. For example, in each of the noted suc- 
cesses, single potent cancer drivers present in a significant 
fraction of patient malignancies were targeted; however, when 
a minor fraction of patients are responsive, all-comer clinical trial 
data may mask the responders. This was first demonstrated in 
non-small-cell lung cancer (NSCLC) patient trials that initially 
failed to show significant responsiveness to EGFR-targeted tyro- 
sine kinase inhibitors; however, the ~10% of patients whose tu- 
mors actually harbored activating EGFR mutations were 
uniquely sensitive (Lynch et al., 2004; Paez et al., 2004). Now, 
screening of lung cancers for such mutations prior to therapy 
is routine practice. Lung cancer is the most prevalent US cancer; 
if limited to clinical trials, accurate identification of therapies 
effective in a fraction of less-common cancer types may not be 
possible. Nonetheless, when a specific target was known, strat- 
ification of patients has identified additional effective therapies, 
such as inhibitors for BRAF mutant melanomas and ALK translo- 
cation-positive NSCLCs (Pagliarini et al., 2015). Unfortunately, 
patients treated with single targeted therapies inevitably relapse 
with cancers that are resistant to the original drug. 

Another challenge in targeting single drivers is the feedback 
response upon molecular network disruption that prevents effi- 
cacy or causes increased severity. Understanding such molecular 
responses can aid in the discovery of more effective combination 
therapies. In addition, unbiased molecular queries are showing 
promise in identifying signatures that correspond to prognosis 
and/or therapeutic outcomes. For example, in some cases, 
unique transcriptome signatures stratify cancers into distinct ther- 
apeutic and/or prognostic categories and thus improve patient 
management (e.g., Garraway, 2013). Thus far, this approach 
has been used primarily for determining which patients require 
aggressive chemotherapy treatment, hence reducing the fre- 
quency of over-treatment. Oncotype DX and FDA-approved 
MammaPrint tests, both based on distinguishing transcriptome 
signatures, are now utilized in the clinic to identify the low risk 
breast cancer patients to be excluded from aggressive treatment. 
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Figure 1. Targeting the Tumor and Its 
Microenvironment 

Genetic alterations produce oncogenes that drive 
signaling pathways in cancer cells facilitating 
survival and growth. However, tumor cells also 
cooperate with stromal cells, including vessels, 
fibroblasts, and various immune cells, to acquire 
growth factors, an energy supply and protection 
from host defenses. These key autonomous and 
stromal mechanisms constitute potential thera- 
peutic targets both locally, and for immune cells 
also in the circulating blood and distant immune 
organs, as shown by indicated numbers. (1) Can- 
cer cell growth driven by an aberrant kinase 
(“Driver Gene”) can be targeted by small-molecule 
inhibitors. (2) Oncogenic signaling promoting un- 
controlled cell cycling can be disrupted (e.g., anti- 
metabolites, anti-microtubule agents, DNA- 
damaging agents). (3) Tumor growth requires 
development of new vasculature for enhanced 
nutrient demands, which can be blocked by anti- 
angiogenic agents. (4 and 5) Growth of cancer 
cells stimulated by release of either host-derived 
hormones (4, green arrow) or growth factors (5, 
blue arrows from blood vessels, fibroblasts, 
macrophages, and myeloid-derived suppressor 
cells [MDSC]) can be targeted by hormone in- 
hibitors (e.g., anti-hormones or biosynthesis in- 
hibitors) or growth factor receptor inhibitors, 
respectively. (6 and 7) Tumor cells can shift the 
inflammatory response to an immunosuppressive 
mode (e.g., activation of CTLA-4 and PD-1 in 

T cells or PD-L1 in cancer cells). The immunosuppressive environment can be reversed via treatment of immunomodulatory cytokines (6, modulator sign; e.g., IL- 
2, IL-15) or immune checkpoint inhibitors (7, modulator sign; e.g., anti-CTLA-4, anti-PD-1, or anti-PD-LI), resuming anti-cancer activity of T cells. Left inset: key 
for therapeutic modes. Right inset: targeting agents. (Artwork adapted from design by Jonathan Marie). 
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Yet, accuracy is not optimal, and numerous challenges currently 
prevent broad implementation of molecular signature diagnostics 
(van’t Veer and Bernards, 2008). Additionally, the hope is that 
molecular signatures can be identified via unbiased compound 
or molecular screens that will dictate specific effective treatments 
even when the targets are unknown. 

Thus, although clearly impactful, the use of cancer molecular 
constitution to guide clinical practice is in its infancy, and 
research to identify parameters that hone specificity and 
improve accuracy is ongoing. If confined to human research, 
achieving maximum effectiveness is likely impossible due to 
low frequencies of each molecular subtype within most cancers 
and limitations associated with clinical trials. More challenging is 
understanding the impact of complex and varied inherited ge- 
netic constitution on clinical outcomes with subsequent conver- 
sion to clinical practice (Hood and Friend, 2011). In this regard, 
the sophistication of complex trait evaluation in mice using the 
collaborative and diversity crosses may offer a path to discovery 
(Churchill et al., 2004; Svenson et al., 2012). 

The above summary provides only a cross-section of the ther- 
apeutic and diagnostic possibilities currently under investiga- 
tion, and the reader is referred to current review articles for 
more comprehensive information (Chin et al., 2011; Hood and 
Friend, 2011; Yap et al., 2013). Ultimately, the current limitation 
to improving cancer patient care within reasonable timeframes 
may not be the availability of potentially efficacious therapies; 
rather, a major blockade is the lack of a fully developed and inte- 
grated set of reliable preclinical technologies that can navigate 
complex variables in therapeutic responses and diagnostic ac- 



curacy. To optimally develop efficacious therapies, preclinical 
research must utilize a diversity of models that collectively incor- 
porate the biology and genetics dictating therapeutic outcomes 
for specific cancers, and yet achieve sufficient throughput. Here 
we summarize the value and constraints of mouse cancer 
models, highlight recent progress indicating promise, summa- 
rize non-mammalian and ex vivo preclinical models, and explore 
the needs for, and challenges to, developing robust multi- 
faceted preclinical platforms for routine use. 

Mouse Cancer Models in Preclinical Research 

Murine cancer models designed to capture the complexities of 
human cancers currently offer the most advanced preclinical op- 
portunity for navigating diverse mechanisms that provide ratio- 
nale for therapeutic development (Van Dyke and Jacks, 2002). 
One approach is to probe pathobiology mechanisms to design 
effective treatments by perturbation with molecularly targeted 
therapies (Olive and Tuveson, 2006). Additionally, the models 
are being used/developed as preclinical efficacy determination 
platforms to guide clinical trial designs (Singh et al., 2012). How- 
ever, the application of complex cancer models to clinical 
research directives is an emerging science, currently executed 
in individual settings and with limited resources. Significant 
research, ideally in a team-directed, multi-institutional effort, is 
required to hone existing technologies into integrated preclinical 
workflows to optimally accelerate positive clinical outcomes. 

A variety of approaches to mouse cancer modeling are now 
available (Figure 2), and each has strengths and weaknesses 
(Table 1). Here, we address the limitations of standard Cell 
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Figure 2. Current State of Preclinical Can- 
cer Modeling 

Preclinical mouse models can be defined ac- 
cording to the species source of tumor, how it is 
created, and how it is manipulated. (Upper panel) 
Tumors derived from human patients, and other 
non-murine species, can be directly transplanted 
into immunocompromised mice to form patient- 
derived xenograft (PDX) models; PDXs can also be 
established from circulating tumor cells (CTCs). 
Alternatively, these same tumors can produce 
established cell lines maintained in vitro as cell 
cultures, and transplanted into immunocompro- 
mised mice to form cell line-derived xenograft 
(CDX) models. Since the hosts of these tumors 
need to be immunocompromised, they are useful 
only for testing the efficacy of chemotherapeutics 
(Chemo) and targeted small-molecule inhibitors 
(Targeted). Xenograft models derived from canine 
patients also belong to this category, but are not 
shown here. (Lower panel) Mice can be en- 
gineered to generate tumors of human relevance 
with respect to histopathology, etiology, and mo- 
lecular wiring. Offspring of such genetically en- 
gineered mice (GEM) can serve directly as pre- 
clinical models themselves, in which case the tumor is treated at its precise point of origin. Notably, model building can be streamlined by using non-germline 
approaches, one of which is to genetically modify ES cells and study the arising chimeric mice without time-consuming breeding schemes. Alternatively, tumors 
harvested from GEMs can be transplanted and expanded into fully immunocompetent syngeneic hosts, forming GEM-derived allograft (GDA) models. Syngeneic 
models allow preclinical studies of not only chemotherapeutic and small-molecule drugs, but also of all varieties of immunotherapeutic agents (Immuno). 




line-Derived Xenograft (CDX) models, describe genetically and 
biologically engineered mouse cancer models [Genetically 
Engineered Mouse (GEM), GEM-Derived Allograft (GDA), 
Patient-Derived Xenograft (PDX) models], review values and 
constraints, and highlight recent progress. Thus far, results indi- 
cate promise in understanding cancer pathobiology and in the 
enhancement of clinical efficacy prediction, but also underscore 
the need for further development to achieve consistent reliability. 

Traditional Mouse Models in Therapeutic Development 

Historically, preclinical mouse models have co-evolved with 
cancer therapy development (Figure 3). The earliest models 
were built through transplantation of murine tumors into immu- 
nocompetent host mice (DeVita and Chu, 2008; Talmadge 
et al., 2007). These early mouse-in-mouse isograft models 
served as workhorses for drug screening during the 1960s and 
1970s, and were successful in identifying a number of effective 
cytotoxic drugs such as vincristine and procarbazine (DeVita 
and Chu, 2008). During the 1980s, researchers explored mecha- 
nisms of metastasis using selected murine and human tumor cell 
lines. A series of investigations by Fidler and colleagues demon- 
strated that metastasis is not random but site-selective (Fidler 
and Hart, 1982), and that metastatic patterns are injection site- 
dependent, supporting the establishment of “orthotopic” 
models (Talmadge et al., 2007). Since then, cancer therapeutic 
development has relied upon the more tractable CDX transplan- 
tation models, in which tumors develop after subcutaneous 
injection of in vitro-established human cancer cells into immuno- 
compromised mice (Figure 2). The cell lines have been selected 
over many passages for rapid 2D growth on plastic in serum- 
containing media. The NCI60 cell line panel (DeVita and Chu, 
2008; Talmadge et al., 2007) provided a valuable resource 
from which most CDXs were generated, and recent efforts 
have greatly expanded the repertoire (Reinhold et al., 2015). 



These models are easily established in a wide variety of labora- 
tory settings and have been successfully used to identify an 
abundance of cytotoxic drugs leading to chemotherapy treat- 
ments that still dominate clinical cancer management (Figure 3). 

Unfortunately, CDXs have failed to predict human efficacy for 
most therapies targeted to cancer-driving proteins (Johnson 
et al., 2001), as evidenced by the low FDA approval rate of 
5%-7% for targeted therapeutics (Sharpless and Depinho, 
2006). With an average time from discovery to clinical practice 
of 12 years, at an average estimated cost of $0.5-$2.0 billion 
(Adams and Brantner, 2006) and an immeasurable human 
price, this low yield forestalls even a goal to chronically manage, 
rather than cure, cancers. The observation that most cancer 
therapeutics fail in clinical phase II and III efficacy assessment 
indicates that current standard preclinical practice inadequately 
addresses complex challenges to successful treatment, such 
as host immune responses, cancer heterogeneity, and drug 
resistance. Consequently, the system cannot be used to opti- 
mize a multitude of variables known to influence therapeutic 
outcomes, such as combinatorial therapies, dosing schedules, 
and drug delivery methods (Al-Lazikani et al., 2012). CDXs 
continue to be valuable in identifying non-targeted cytotoxic 
agents and in primary assessment of drug toxicity (Teicher, 
2006), for analyzing resistance mechanisms (Garraway and 
Janne, 2012) and in triaging potentially effective targeted thera- 
pies for evaluation in more representative models. 

Mouse Models Designed after Patient Cancers 

Mice and humans are believed to have diverged from each other 
~87 million years ago (Bailey et al., 2013), so naturally there are 
numerous significant similarities between the two species, and 
also many marked disparities, including differences in immune 
systems and drug metabolism. Based on the premise that many 
cancers have been cured in mice and not in people, many argue 
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Table 1. Comparison of Clinical and Preclinical Model Properties 

Micro- Feasibility in 

Immune Status environment Disease Experimental Initiation/ Pathway 

of the Host Context Human Relevance Tissue Availability Complexity Robustness Progression Engineering Cost 

Clinical Trials Functional Natural Standard Highly Limited High Low Intact Irrelevant Very High 
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that mice are inappropriate for use in therapeutic development 
(Leaf, 2004). However, it is critical to understand that “cures” 
have been attained only in CDX models, thus dismissal of all 
mouse cancer models as irrelevant is unwarranted. Human can- 
cers are enormously complex, and their evolutionary etiology 
generates vast diversity among and within them, thus challenging 
the attainment of successful treatments. However, as knowledge 
of cancer complexities has increased, so has the ability to design 
mouse models that better represent cancer patients. PDX and 
GEM models develop tumors with the greatest similarity to human 
diseases yet achieved, and the past 5 years have seen an in- 
crease in their employment in preclinical research. As with all 
models, each approach has its strengths and limitations (Table 1). 
Early studies suggest promise for improved guidance in the 
development of successful clinical treatments (Table 2), and 
yet also emphasize the need for further scrutiny and refinement. 
The following provides a balanced consideration of model advan- 
tages and limitations, their ramifications in obtaining optimally 
accurate preclinical data, and the logistical requirements for 
achieving efficiency, accuracy, and reproducibility. 
Patient-Derived Xenograft Models 

Relative to CDX models, immunocompromised mice bearing sub- 
cutaneous surgically derived clinical tumor samples (PDX models) 
are better aligned with human disease, since intact tissue that pre- 
serves tumor architecture is transferred directly to recipient mice 
and not compromised by in vitro adaptation (Figure 2). PDXs are 
the only models harboring bona fide tumor targets directly from 
the patient, and hence their use in drug discovery is expanding 
rapidly. Promise for such models, first developed by Fiebig (Fiebig 
et al., 1984), was demonstrated when chemotherapeutic agents, 
such as alkaloids and anti-metabolites, were shown to elicit 
similar responses in mice and patients (Mattern et al., 1988). In 
contrast, a study of responses to numerous cytotoxic agents in 
NCI60-based CDX models showed that the predictive value for ef- 
ficacy was much less impressive (Johnson et al., 2001). Unfortu- 
nately, early studies utilizing PDXs were limited by difficulties in 
collecting clinical samples and in achieving sufficient take rates. 

The recent resurgence of PDX model use for therapeutic eval- 
uation has been fueled by significant improvements in clinical 
sample access and transplantation technology. Cancers estab- 
lished as PDXs can, in early passages, retain the stromal compo- 
sition and histologic and molecular heterogeneity characteristic 
of those in patients (Hidalgo et al., 2014; Tentler et al., 2012). 
Since these properties critically impact therapeutic responses 
and biomarker specificity, PDX models provide a preclinical 
venue for addressing some of the most challenging barriers to 
successful patient therapy. Furthermore, human target speci- 
ficity allows for direct evaluation of lead human-specific thera- 
peutics, such as antibodies, in clinical development. 

Methodologies for PDX establishment and characterization are 
detailed elsewhere (Hidalgo et al., 201 4; Tentler et al., 2012; Zhang 
et al., 201 3). For some cancers, such as certain melanomas, lung, 
and colorectal cancers, transplant take rates can reach >75%, 
and the time required for tumor growth can be as little as 
2-4 months. However, these attributes vary widely depending 
on sample type and amount (e.g., fresh biopsy tissue, fine needle 
aspirate, circulating tumor cells), tumor origin, molecular proper- 
ties, and recipient strain (see Supplemental Information). 
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Figure 3. Timeline of Key Preclinical Cancer 
Model Developments since 1950 

As the conceptual targets of cancer treatment 
progressed from actively dividing cells to onco- 
genic signaling and immune checkpoints, pre- 
clinical models (right side) and cancer therapies 
(left side) co-evolved accordingly. This evolution 
was highly dependent on technical advances, re- 
sulting in waves of activity. For example, recent 
development of fully immunocompromised mice 
and diverse syngeneic GEM models has signifi- 
cantly promoted PDX and GDA models, respec- 
tively, for preclinical cancer studies (the bracket). 
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Consequently, some cancers, such as neuroendocrine, luminal 
ER+ breast, and prostate cancers (Rosfjord et al., 201 4) are under- 
represented. Notably, PDX engraftibility appears to significantly 
correlate with clinical aggressiveness (Hie et al., 2015). 

Relative to subcutaneous transplants, cancers orthotopically 
transferred into organs of origin are more likely to maintain tumor 
microenvironment characteristics that impact therapeutic out- 
comes (Talmadge et al., 2007). However, orthotopic PDX pro- 
duction is technically challenging, and, for most cancer types, 
tumor growth and responses must be monitored via expensive 
and often laborious longitudinal imaging. Thus, preclinical thera- 
peutic studies currently exclusively utilize subcutaneous models. 

Production of PDX cohorts is by serial tumor transplantation, 
and, given the likelihood of change with each passage, therapeu- 
tic studies are most representative in low-passage models. 
Additionally, human stromal components are maintained for 
only 2-3 passages, with mouse stromal elements becoming 
dominant thereafter (Rosfjord et al., 2014). Unfortunately, if 
limited to early passage use, each model represents a limited 
resource. Hence, most preclinical studies utilize models that 
have been expanded, banked, and developed into significantly 
sized cohorts. The extent of sacrifice in accurately predicting 
efficacy is presently undefined and likely depends on the mech- 
anism of therapeutic activity. As such, in propagating PDXs, 
parental tumor traits should be routinely monitored, and devia- 
tions must be considered in interpreting therapeutic and 
biomarker data. 

To circumvent immune rejection, human cancers must be 
transplanted into immunocompromised mice. Commonly used 
recipients, such as nude, SCID, and NOD/SCID strains, vary in 
the extent of immune impairment (detailed in Supplemental 
Information). IL-2Ry-deficient NOD/SCID mice (NSG and NOG 
strains) are the most severely impaired, and often yield improved 
take rates. Critically, the requirement for immunocompromised 



hosts precludes assessment of arising 
therapies designed to modulate immune 
function (e.g., immune checkpoint inhibi- 
tors a-CTLA-4, a-PD-1, a-PD-LI). More- 
over, therapeutic responses in general 
are likely influenced by preexisting can- 
cer-dependent immune phenotypes and 
immune responses elicited upon ther- 
apy-induced tumor perturbation (Zitvogel 
et al., 2008). The extent to which compro- 
mised immune systems limit predictive value for a given thera- 
peutic approach will be determined as comparisons between 
PDX and clinical outcomes are expanded. Technologies to 
“humanize” the mouse immune system by transplanting purified 
human CD34 + hematopoietic stem cells into myeloablated 
NSG/NOG recipients (e.g., “BLT” mice: http://jaxservices.jax. 
org/invivo/humanized-BLT-mice.html) and other chimeric stra- 
tegies have been developed (Legrand et al., 2009; Shultz et al., 
2014). However, the high cost of recipient mice, limitations on 
human bone marrow acquisition, engraftment variability, and 
technical demands currently preclude use of these models in 
preclinical therapeutic discovery. 

Despite the challenges to routine preclinical application, 
several PDX studies have proven effective in paralleling human 
outcomes (Malaney et al., 2014), in exploring drug resistance 
mechanisms (Das Thakur et al 2013) and in identifying targets 
for second-line treatment (Girotti et al., 2015). Programs are 
also underway to employ PDX models in individualized precision 
cancer care. To date, this approach has been most successfully 
applied to pediatric patients with advanced sarcomas who have 
demonstrated the predicted response, sometimes to drugs not 
previously associated with this indication (Tentler et al., 2012). 
Patient-specific studies are currently limited by expense and 
relatively long and unpredictable times for establishing test ani- 
mals. Since current clinical trials generally involve patients who 
have undergone prior failed treatments, results may not always 
be obtainable in a beneficial timeframe. 

Genetically Engineered Mouse Cancer Models 
Of all murine cancer models, GEMs provide the most complete 
representation of cancer development; cancers develop from 
initiation through progression, co-evolve with intrinsic stroma, 
and possess an intact immune system (Figures 1 and 2). How- 
ever, GEM models are the most challenging to work with effec- 
tively, and species differences must be carefully considered in 
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Table 2. Representative Clinically Relevant Mouse Trials 



Trial Design 


Cancer Type 


Model Type 


Engineered Drivers 


Drugs/ Treatment 


Significance 


Relevant Publications 


Preclinical 


Hematopoietic 

(APL) 


GEM 


PML-RARa fusion 
PLZF-RARa fusion 


Retinoic acid 


Demonstrated the efficacy of retinoic 
acid plus As 2 0 3 in specific APL 
subtypes, validated in clinic 


(Ablain and de The, 2014; 
Pandolfi, 2001) 


Preclinical 


Pancreas 

(Neuro-endocrine) 


GEM 


RIP1-Tag2 


Sunitinib 


Demonstrated the efficacy of Sunitinib 
plus Imatinib, validated in clinic. FDA approved 
for pancreatic cancer treatment in 201 1 . 


(Pietras and Hanahan, 2005; 
Raymond et al., 2011) 


Preclinical 


Medulla-blastoma 


GEM 


Ptc1 +/ - 

P53~ /_ 


GDC-0449 
(SMO inhibitor) 


Demonstrated the efficacy of an Shh 
pathway small molecule inhibitor, 
validated in clinic 


(Romer et al., 2004; 
Rudin et al., 2009) 


Preclinical 


Pancreas 

(Neuro-endocrine) 


GEM 


RIP1-Tag2 


Erlotinib 

Rapamycin 


Demonstrated efficacy of combining 
drugs targeting EGFR and mTOR 


(Chiu etal., 2010) 


Co-clinical 


Pancreas 

(PDA) 


GEM 


LSL-Kras G12D 
LSL-Trp53 R172H 
Pdx-1 -Cre 


Gemcitabine 

Nab-Paclitaxel 


Provided mechanistic insight into clinical 
cooperation between Gemcitabine and 
Nab-Paclitaxel 


(Frese et al., 2012; 
Goldstein et al., 2015) 


Co-clinical 


Pancreas (PDA) 


GEM 


LSL-Kras G12D 
LSL-Trp53 R172H 
Pdx-1 -Cre 


CD40 monoclonal 
antibody Gemcitabine 


Demonstrated that targeting stroma was 
effective in treatment of metastatic PDA 


(Beatty etal., 2013) 


Co-clinical 


Lung 

(NSCLC) 


GEM 


KRAS G12D 
p53 ,l/fl 
Lkbl Ml 


Selumetinib 

Docetaxel 


Validation of improved response 
of adding Selumetinib to 
Docetaxel treatment 


(Chen et al., 2012; 
Janne et al., 2013) 


Co-clinical 


Lung 

(NSCLC) 


GEM 


EML4-ALK fusion 


Crizotinib 

Docetaxel 

Pemetrexed 


GEM model predicted clinical outcome 
of drug combinations 


(Chen et al., 2014; 
Lunardi and 
Pandolfi, 2015) 


Co-clinical 


Various Sarcomas 


PDX 


N/A 


Various chemotherapies 


PDX testing predicted clinical outcome 
of drug combinations 


(Stebbing et al., 2014) 


Postclinical 


Ovarian 

(SEOC) 


GDA; 

PDX 


RB/p53-deficient 
BRCA1 /2-deficient 


Olaparib 

Cisplatin 


Validation of treatment efficacy in BRCA 
mutant tumors in both GDA and 
PDX models 


(Kortmann et al., 2011; 
Szabova et al., 2014) 


Postclinical 


Pancreas 

(Neuro-endocrine) 


GDA 


RIP1-Tag2 


Anti-VEGFRI and 
anti-VEGFR2 antibodies 


Identification of mechanisms of resistance 
to anti-angiogenic therapies 


(Casanovas et al., 2005) 


Biomarker 


Lung 

(NSCLC) 


GEM; 

Carcinogen- 

induced 


Various Models 


N/A 


Used in-depth quantitative MS-based 
proteomics to profile plasma proteins 


(Hanash and Taguchi, 2011) 


Biomarker 


Pancreas (PDA) 


GEM 


Kras G12D 
lnk4a/Arf fl/fl 
Pdx-1 -Cre 


N/A 


Used in-depth proteomic analyses to 
identify candidate markers applicable 
to human cancer 


(Faca et al., 2008) 
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experimental designs and interpretations. Extensive experience 
and infrastructure are required to ensure the use of optimally 
accurate models and to achieve sufficiently populated well- 
controlled preclinical studies. Yet, GEM cancer models provide 
the only opportunity to evaluate drug delivery, therapeutic 
response, and biomarker expression for cancers evolving within 
their natural microenvironment (autochthonous cancers). These 
complex dynamic processes contribute to overall disease 
properties, and in particular, constitute a source of the inter- 
and intra-tumoral heterogeneity that challenges successful 
therapeutic development. Additionally, the accuracy of some 
therapeutic interventions, such as those targeting the immune 
system, may depend on the constitution of evolutionary, rather 
than transplanted, disease. Indeed, overall, GEMs and GEM- 
derived models are currently the only preclinical platform for 
evaluation and optimization of immunomodulatory therapies. 
Although some immune properties differ in mouse and human, 
there is significant conservation (Bailey et al., 2013); moreover, 
many differences can be managed via data interpretation or 
minimized by using genetically engineered “humanized” models 
(Scheer et al., 2013). Finally, autochthonous GEMs are the only 
viable models for evaluating prevention therapies. 

Several reports show that well-designed GEM studies can 
contribute to improved clinical trials (Table 2), not only in identi- 
fying potentially efficacious therapies but also in predicting both 
positive and detrimental effects in molecular subclasses. A major 
power of GEM approaches is in the flexibility to create models 
with precise molecular specificity. With increasing sophisticat- 
ion, several strategies (summarized below and detailed else- 
where [Abate-Shen et al., 2014]) have been employed over the 
past three decades to significantly enrich our understanding of 
cancer mechanisms. A plethora of genes frequently altered in 
human cancers have been validated as disease drivers in 
GEMs, thereby facilitating the evaluation of cancer evolutionary 
mechanisms and kinetics, susceptible cell and molecular 
targets, relative cancer cell and microenvironment roles, and 
mechanisms of invasion and metastasis. Indeed, entire natural 
disease histories can be mapped (Stiedl et al., 2015; Van Dyke 
and Jacks, 2002). 

In the process of basic discovery, countless GEM cancer 
models representing a variety of histiocytic cancer types driven 
by multiple independent drivers have been produced, and 
many are currently used in preclinical evaluations. Although no 
model can perfectly capture the human condition, several 
GEM models tractable for preclinical studies develop cancers 
with remarkable molecular and pathologic similarity to their 
human counterparts. However, since most established GEMs 
were created to address basic mechanisms, many do not accu- 
rately model human disease and/or are intractable for effective 
preclinical evaluation. Furthermore, each engineering approach 
can elicit untoward anomalies. Such circumstances can be 
accommodated in the interpretation of mechanistic studies, 
but are the basis for exclusion of many models for effective pre- 
clinical research. Thus, choosing appropriate models as sub- 
jects for preclinical discovery requires a deep understanding of 
cancer biology and genetics and also of engineering modalities. 
The following provides a reasonable guide for optimizing the 
value of GEM-based preclinical platforms. 



Germline GEMs 

An extensive array of technologies is employed to engineer the 
mouse germline with great precision. By editing the genome of 
embryonic stem (ES) cells or zygotes, mice can be programmed 
for cell-type-specific disruption of tumor suppressor genes via 
direct mutation or expression of interfering non-coding RNAs 
(RNAi) (Walrath et al., 2010) and for oncogene expression at 
physiological or cancer-analogous levels. Furthermore, mice 
can be “humanized” by engineering the expression of drug tar- 
gets in relevant cell types (Scheer et al., 2013) so that human- 
specific targeted therapies, such as antibody-based drugs, 
can be tested in GEM models. While traditional methods for 
constructing locus-specific genetic changes require significant 
lead times for engineering, the recent development of rapid 
sequence-targeted approaches (Mou et al., 2015) has signifi- 
cantly reduced this time to weeks instead of many months. In 
particular, clustered regularly interspaced short palindromic re- 
peats (CRISPR)/Cas9 technology, which is efficient and versa- 
tile, is accelerating germline engineering and also facilitating 
rapid somatic engineering (see below). 

Depending on the strategy, expression of an engineered 
“event” can be constitutive or inducible, although gene induction 
with cell-type and temporal specificity provides the best possibil- 
ity for accurately modeling disease development. Inducibility is 
achieved by combining cell-specific expression of transcription 
factors (e.g., doxycycline-modulated tet-transactivators) or re- 
combinases (Cre-/ox or Flp-FRT) with cognate c/s elements 
linked to a target gene, or by expressing proteins fused with a 
hormone-responsive domain (e.g., the tamoxifen-inducible 
estrogen receptor domain) (see Supplemental Information). 
When multiple distinct inducible systems are combined within 
the same cancer model, cancer-specific mutations can be 
induced sequentially in order to map and emulate cancer 
evolution (e.g., [Young et al., 2011]) and thus to generate increas- 
ingly relevant preclinical models. Reversible inducibility can be 
achieved with each of these technologies, although small 
molecule-mediated modulation of transcription factors and hor- 
mone-responsive domains are the most tractable for toggling 
expression on and off (Abate-Shen et al., 2014; Texido, 2013). 
This approach facilitates the identification of events required 
to sustain tumor growth (“oncogene addiction”) (e.g., Soucek 
et al., 2008) and thus of potential therapeutic targets (e.g., Kwong 
et al., 201 2). T umor responses to the shutdown of oncogenes or 
restoration of functional tumor suppressors within tumors, or 
appropriate effector cells, indicate the potential efficacy of 
targeted therapies, while genetic ablation in the entire animal 
predicts the overall toxic effects of specific inhibitors. How- 
ever, since off-target effects will not register in this approach, 
results only indicate whether a given therapy is potentially 
efficacious. 

A critical, often overlooked, consideration when building 
GEM cancer models is the incorporation of known environmental 
etiologies. However, there are notable examples wherein certain 
environmental factors were validated as etiologic agents and 
thus produced representative cancer models, including HPV 
E6/E7-induced cervical cancer (Riley et al., 2003), UV acceler- 
ated melanomas (BRAFV600E [Cao et al., 2013], mutant 
HRAS [Kannan et al., 2003], and HGF/MET [Noonan et al., 
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200 ] models), and Helicobacter - fueled gastrointestinal cancers 
(Rogers and Fox, 2004). Exposure of GEM cancer models to 
environmental mutagens can be used to approximate the muta- 
tion load of many human cancers (e.g., Westcott et al. , 2015), 
which influences therapeutic outcomes such as in drug-resistant 
relapse and neoantigen load-dependent immunomodulation. 

The extent to which findings in GEMs extend to patients 
depends on engineering mice based on our understanding of 
human cancer etiologic drivers, cellular origins, heterogeneity, 
pathogenesis, and clinical properties. To recapitulate human can- 
cer development, clinically relevant driver gene(s) or pathways 
must be perturbed in relevant target cells. For adult cancers, 
gene expression should be targeted to adult, rather than devel- 
oping, organs. Furthermore, for optimal modeling, cancers should 
progress in a relevant sequence, since the order of events impacts 
properties of evolving tumors. Ideally, both initiation and progres- 
sion to aggressive cancer should be evaluated using individual 
and relevant combinations of molecular aberrations thought to 
be causal in humans. High phenotypic penetrance and consis- 
tency among animals within a lineage are essential for tractability. 

The accuracy of disease modeling depends on actually 
achieving the specificity envisioned in experimental designs, 
which is not always realized because of technical limitations 
and/or gaps in current knowledge. Of course, engineered se- 
quences must be validated, but it is also critical that expected 
transcriptional specificities be confirmed. Unless targeted to 
specific genomic locations, transgenes insert randomly, and 
expression can be dramatically altered depending on insertion 
sites. Furthermore, transgenes may not carry all necessary 
regulatory signals. Hence, several founder lines should be estab- 
lished and fully characterized before selecting accurate repre- 
sentative lines for modeling cancers. Even targeted genetic 
changes have the potential to alter gene regulation. Thus, spec- 
ificity, levels, and range of expression must be evaluated 
for each model; aberrant expression usually alters disease and 
can also yield ectopic phenotypes that hinder tractability and 
invalidate data. Yet, a surprising number of existing engineered 
strains, including those driving inducible expression, are not 
fully characterized. Hence, when choosing cancer models for 
preclinical studies, it is essential that expression and disease 
patterns are well established and accurately represented (see 
Supplemental Information). 

Non-Germline GEM Models 

While autochthonous GEM models have great utility, most are 
not tractable for large-scale screening of multiple anti-cancer 
drug candidates due to high cost, long timelines, extensive com- 
plex breeding, and/or difficulties in obtaining synchronous 
tumorigenesis. Preclinical analysis of metastatic lesions is 
particularly challenging; primary tumors arise stochastically 
with no reliable timetable, as in humans, and multiple tumors 
often develop. Thus, extensive longitudinal tomographic imaging 
is required to enroll mice bearing similarly sized tumors for ther- 
apeutic evaluation (e.g., Weaver et al., 2012). Such procedures 
require specialized expertise and can be too expensive and 
time-consuming for first-line drug screening. Several strategies 
to produce “non-germline” GEMs have been developed that 
bypass breeding, reduce expense, and, in some cases, improve 
flexibility, uniformity, and timelines (Heyer et al., 2010). 



GEM-Derived Allograft Models. GEM-derived allograft (GDA) 
models marry the genetic and biologic human cancer similarities 
of GEM models with the relative ease of transplantation technol- 
ogy of PDXs (Heyer et al., 2010). Without in vitro manipulation, 
tissue fragments derived from tailor-made GEM tumors are 
expanded by transplantation, orthotopically or subcutaneously, 
into immunocompetent syngeneic hosts (Figure 2). Thus, tumors 
can be banked to facilitate large cohort production, and efficacy 
studies can be performed in industry-friendly timeframes (~3- 
8 weeks), allowing for increased throughput. Indeed, a battery 
of treatments can be evaluated in GDAs prior to (preclinical) 
or parallel with (“co-clinical”) clinical trials (Table 2). As with 
GEMs, immune systems are fully functional in GDAs, and inter- 
actions among tumor cells and their intrinsic microenvironments 
are maintained. 

GDAs are particularly amenable to the evaluation of metastatic 
disease, which is responsible for most cancer-related deaths 
and is rarely assessed preclinically. In GDAs, metastases occur 
from single primary tumors, which can be resected to allow time 
for metastatic progression (Day et al., 2012). This approach also 
emulates clinical care standards for many cancers and facilitates 
comparing therapeutic responses of both primary and metasta- 
tic disease derived from the same GEM cancer (Figure 4). 

As with PDXs, serial passaging increases the likelihood that 
tumor properties will deviate from parental samples due to 
further evolution and/or selection of sub-compartment growth; 
thus, transplanted tumors should be monitored for molecular 
and biological similarity to founding tumors. Additionally, since 
transplanted tumors do not evolve in situ, GDAs cannot legiti- 
mately be used for prevention studies, and some therapeutic 
outcomes may differ between autochthonous GEMs and 
GDAs. Given the potential tradeoff of accuracy for tractability, 
candidate therapies efficacious in GDAs should be subse- 
quently validated in the original GEM models prior to clinical 
studies. 

Stem Cell-Derived Chimeras and Somatic Models. Mice 
chimeric for genetically engineered cells are created through 
implantation of GEM-derived or genetically manipulated ES cells 
into pre-implantation embryos. Since oncogenic alleles are engi- 
neered ex vivo in ES cells, many mice with the desired genetic 
composition can be generated in the absence of complex, labo- 
rious, and long-term breeding schemes. The potential value of 
this approach was first highlighted in the production, analysis, 
and preclinical evaluation of lung adenocarcinoma (Zhou et al., 

2010) . Once constructed, ES cells harboring the desired alleles 
can be derived from blastocysts produced by a penultimate 
cross. In turn, this bankable resource can be used to generate 
mice chimeric for mutant and wild-type cells (Premsrirut et al., 

2011) , facilitating conditional RNAi-mediated knockdown of 
target expression via manipulation of ES cells, which can then 
be used to generate chimeric mouse cohorts (Dow et al., 2012). 

Notably, the advent of CRISPR/Cas9 technology, and with it 
the ability to perform complex gene editing with relative ease 
and speed, has dramatically enhanced the value of non-germline 
GEM approaches. Several groups have precisely modified onco- 
genes and tumor suppressor genes directly in somatic cells of 
adult mice, significantly improving the feasibility and flexibility 
of this genetic engineering approach (Chen et al., 2015; Dow 
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Figure 4. Generation and Application of 
Metastatic GDA Models 

(I) GDAs are derived from tumors arising in mice 
genetically tailored to produce human-relevant 
models. Relevance can be further enhanced by 
including appropriate etiological agents (lighting 
bolt). Arising tumors are resected, labeled with 
imageable markers (green), and directly trans- 
planted into fully immunocompetent syngeneic 
mice at either subcutaneous or orthotopic sites. 
The imageable markers allow monitoring of tumor 
growth and drug response, and/or FACS purifi- 
cation for analysis. Once successfully trans- 
planted, GDAs can be expanded for banking and/ 
or preclinical studies. Mice bearing GDAs can be 
treated directly with individual or combination 
drugs (*) to study therapeutic efficacy at the “pri- 
mary” tumor site (II). (Ill and IV) Alternatively, GDAs 
can be resected using survival surgery, and 
treatments focused on metastatic disease, simu- 
lating first-line treatment in human patients 
following primary tumor resection. GDA models 
allow for interventive treatment of metastatic dis- 
ease once detected (III), or preventive adjuvant 
treatment initiated immediately following surgical 
resection (IV). GDA models are thus well suiting for 
studying primary or metastatic disease, with in- 
terventive or preventive approaches using 
pathway-targeted small molecule and/or immu- 
notherapeutic agents. 



et al., 2015; Maddalo et al., 2014; Platt et al., 2014). These 
models also better mimic human cancer relative to standard 
germline GEMs in that tumors typically arise from fewer cells in 
the context of normal stroma 

In a variation of the non-germline GEM approach, genetically 
engineered stem or progenitor cells can be transplanted into 
syngeneic mice, where they can home to appropriate tissue 
targets and become the cells of origin for developing tumors 
(Heyer et al., 2010). These models are especially amenable for 
studying hematopoietic cancers, where the stem cells are well 
characterized and the host can be prepared for receiving trans- 
planted cells by using irradiation to create a favorable niche for 
the engineered hematopoietic stem/progenitor cells to colonize. 
Successes have also been reported for other cancers (Heyer 
et al., 2010). 

Logistics for Optimizing Preclinical Studies 

Extensive complexities that impede successful drug develop- 
ment in cancer patients dictate that faithful murine cancer 
models must themselves be complex. Both PDX- and GEM- 
based models offer this opportunity. However, their very com- 
plexity warrants that informative models are generated and 
characterized with substantial knowledge of cancer mecha- 
nisms and modeling limitations, rigorous animal maintenance 
and production, routine phenotypic and genetic monitoring, 
appropriate strategies for therapeutic response evaluation, 
and consideration of multiple variables that impact data interpre- 
tation. To achieve routine therapeutic and biomarker develop- 
ment that positively influence patient care, preclinical studies 
must be (1) well-powered with significant cohort sizes and 
several evaluation parameters, (2) goal-oriented and efficiently 
executed, and (3) highly reproducible. 



Experimental Considerations 

Once models that optimally represent human disease have 
been selected, clinical relevance relies on experimental param- 
eters that are comparable and/or translatable to human prac- 
tice. These include, but are not limited to, dosing levels and 
schedules, drug pharmacology, response evaluation methods, 
and endpoint choices. Therapeutic agents’ pharmacokinetics 
(PK) and ability to modify targets when known (pharmacody- 
namics; PD) should be measured in tumor-bearing mice. The 
fate of administered drugs is largely determined by drug 
metabolizing enzymes essential for their absorption, distribu- 
tion, metabolism, and excretion (ADME). Therefore, the differ- 
ences that exist between the central metabolizing enzymes in 
mice and humans, the cytochrome P450 (CYP) family, consti- 
tute a confounding factor in extrapolating drug PKs and the re- 
sponses they elicit. Since the maximum tolerated dose (MTD) 
of many drugs in mice is significantly higher than in humans, 
it is essential to evaluate efficacy by using doses achievable 
in patients. However, this is possible only when human PKs 
are known; for example, for repurposing FDA-approved 
drugs, for preclinical evaluation of combination therapies that 
comprise single phase II agents, and for co-clinical experimen- 
tation wherein mouse and human evaluations are performed in 
parallel, such that clinical toxicity results are available. Even 
when appropriate human dosing is known, there is no simple 
formula for approximating comparable doses to achieve the 
same PK in mice, and instead experimental determination is 
required (Sparreboom et al., 1996). Yet, when evaluating 
numerous agents, this approach is not possible; rather, subse- 
quent coordination of clinical results and further preclinical 
dose escalation experiments are needed for optimal response 
assessment. 
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In an effort to apply a genetic solution to the PK problem, a 
number of humanized CYP GEMs have been developed (Gonza- 
lez, 2004; Scheer and Wolf, 2014). Despite these advances, the 
humanized alleles have not yet been incorporated into GEM can- 
cer models or PDX recipients. Such an undertaking will require 
significant resources, substantial time, and community effort to 
generate and evaluate revised models. Nonetheless, the invest- 
ment will be worthwhile if the gap between laboratory mice and 
patients is narrowed. 

The choice of preclinical experimental endpoints to deter- 
mine therapeutic responses is also critical for achieving out- 
comes most representative of those in patients (Talmadge 
et al., 2007). In prevention studies, efficacy is based on dis- 
ease-free or minimized status. For intervention therapy, efficacy 
is justified by overall survival and should not be judged solely 
on tumor growth inhibition. The importance of survival end- 
points is highlighted by a pancreatic cancer clinical trial de- 
signed based on short-term GEM studies demonstrating 
reduced tumor volumes in response to sonic hedgehog 
pathway inhibition combined with gemcitabine (standard of 
care) compared to gemcitabine alone (Olive et al., 2009). Unfor- 
tunately, the trial terminated early due to increased disease 
dissemination and poor patient survival. However, subsequent 
survival studies in the GEM model replicated the clinical result, 
demonstrating that initial drug effects did not predict survival 
outcomes. Hence, the model appropriately predicted patient 
responses, but only with a meaningful endpoint (Couzin- 
Frankel, 2014). 

Tumor growth and therapeutic responses in subcutaneous 
transplant models, such as CDXs, PDXs, and GDAs can be 
monitored by standard caliper measurement. Tumor growth in 
autochthonous and orthotopic transplant models (other than 
skin and breast models) and in all metastatic models must be 
monitored by longitudinal imaging strategies (Wang et al., 
2015). High-resolution 3D images are compiled from sectional 
images generated by tomographic scanning of signals from 
X-ray (CAT), magnetic field-excited atoms (MRI), or injected 
radioactive tracers (SPECT; PET) (Supplemental Information). 
Tomographic imaging requires specific expertise for accurate 
execution and is relatively expensive and time-consuming. Opti- 
cal imaging, which detects visualized wavelengths generated 
from excited fluorescent chromophores (e.g., jellyfish GFP) or 
firefly luminescent reactions (e.g., luciferase), can be employed 
for detection in real-time and is cost- and time-effective; how- 
ever, these methods do not produce accurate tomographic 
data and are limited by tissue absorption. Notably, traceable 
marker proteins required for optical imaging are xenogeneic 
with respect to mammals and can induce immune responses 
in immunocompetent mice, which can result in inconsistent 
activity, graft rejection and/or inhibition of metastasis, confound- 
ing data interpretation. Hence, effective employment of xenoge- 
neic reporters is restricted to short-term studies or studies in 
immunocompromised models, limiting their usefulness in pre- 
clinical science (Steinbauer et al., 2003). However, this problem 
can be circumvented, at least in part, by employing host mice 
genetically engineered to express respective markers at an early 
age, which elicits tolerance and thus recognition as “self” (Day 
et al., 2014). 



Several additional points associated with preclinical trial 
design are worth emphasizing. Tumor mass is a critical factor 
in preclinical studies; vastly different outcomes can result from 
initiating drug dosing when tumors are different sizes. Moreover, 
human tumors are typically much larger than their mouse 
counterparts, which could affect how preclinical data translates 
to the clinic. It is also vital to run preclinical trials with a sufficient 
number of animals in each experimental arm to achieve statis- 
tically significant results; ensuring statistical power must be 
considered a priority for any preclinical study. Therefore, it is pru- 
dent to consult biostatisticians prior to finalizing study designs. 
Finally, the influence of genetic background on tumor behavior 
can be significant, and must be considered when designing 
model systems. Generation and analysis of mouse cancer 
models within the collaborative cross, a large panel of inbred 
mouse strains (Churchill et al., 2004), could also provide impor- 
tant insights into the impact of complex germline genetics on 
tumor predisposition and drug response. 

Infrastructure 

Critical work establishing the utility of murine cancer models in 
preclinical research has taken place in independent laboratories 
over the last 20 years. However, because of severe resource lim- 
itations, the absolute need to perpetuate basic investigator- 
driven mechanistic discovery, and an increasingly competitive 
environment wherein success is measured by individual merit, 
the opportunity for laboratories to execute preclinical studies 
beyond the pilot level is limited. Recent reports indicate that 
most preclinical outcomes at this level are not reproduced 
when studies are not conducted with robust experimental stan- 
dards, such as inclusion of appropriate positive and negative 
controls, execution with sufficient statistical power, attention to 
pharmacological considerations, and implementation of blind 
evaluations (Begley and Ellis, 2012; Begley and loannidis, 
2015). Adherence to all these standards is simply not possible 
in individual laboratories under current conditions. To increase 
accessibility to preclinical evaluation in murine cancer models, 
several institutions have established core facilities that perform 
studies using dedicated staff and common methodologies. 
These cores represent a necessary step to improve reproduc- 
ibility in preclinical outcomes. Yet, most core facilities do not 
have the resources to instate the full range of skills and technol- 
ogies indicated in “Experimental Considerations ” above to 
ensure optimal quality and replication of clinical approaches. 
Additionally, conducting well-powered blinded studies requires 
a sizable dedicated staff, which is generally not achievable in ac- 
ademic cores. Finally, global improvement of murine preclinical 
research must include the generation of an increased range of 
well-characterized, technically tractable and optimally accurate 
models vetted for preclinical evaluation, along with the develop- 
ment of exportable standard operating procedures (SOPs). 

To address these needs, over the past decade several orga- 
nized efforts have been established that are dedicated to: (1) 
improving the accuracy and reproducibility of preclinical drug 
development platforms; (2) developing and exporting SOPs 
and models; (3) understanding cancer pathobiology through 
targeted therapeutics; and/or (4) applying the outcomes of opti- 
mized preclinical therapeutic and biomarker studies to clinical 
research for improved patient care. Common attributes in each 
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Table 3. Future Challenges and Possible Solutions for Mouse Preclinical Cancer Trials 


Issue 


Challenges 


Possible Solutions 


Model improvement 


More precise spatial and temporal control 
of genetic alterations in mouse tissues 


Improve technologies for genomic editing 
(e.g., CRISPR) and regulating gene activity 




Human relevance of stroma, immune system, 
and therapeutic targets in mouse cancer models 


“Humanize” genes via genetic engineering 
and immune system by reconstitution 
with human hematopoietic stem cells 




Recapitulation of the tumor heterogeneity 
found in human cancers 


Introduce environmental etiological factors 
(e.g., UV in skin cancer models); allow tumor 
evolution by avoiding inappropriately 
dominant oncogenic drivers 


Study setting 


Difficulties in diagnosis and treatment of large 
cohorts of mice as individual patients 


Synchronize tumorigenesis by adopting 
inducible GEM or transplantable GDA systems 




Disease progression and clinically relevant 
endpoints in preclinical study 


Improve biomarkers and imaging techniques 
for tumor tracking; adopt clinically relevant 
endpoints (e.g., progression-free survival) 




Integration of pathologic, genomic, bioinformatic, 
molecular, and immunological analyses 


Develop/share improved and standardized 
protocols; organize workflows with core facilities 


Extrapolation to 
human disease 


Evaluating effects of life style on therapeutic outcomes 


Consider gender, diet, and exposure to 
environmental factors in protocol development; 
consider effects of microbiota 




Physiological difference between mouse and human 


“Humanize” aspects of mice; consider scaling 
law in PD/PK, lifespan, hemodynamics, etc. 



case include: (1 ) a sufficient number of dedicated staff covering a 
broad range of expertise; (2) access to sophisticated instrumen- 
tation and technology for a full range of small animal imag- 
ing modalities, histological and molecular pathology, genomic 
technologies, pharmacological methods, model generation, 
and appropriate maintenance and quality control for a large 
“bank” of models; and (3) data management strategies. Exam- 
ples of such organizations include the Center for Advanced Pre- 
clinical Research (CAPR; Center for Cancer Research, National 
Cancer Center and the Frederick National Laboratory for Cancer 
Research, https://ccr.cancer.gov/capr-home); Mouse Clinic for 
Cancer and Aging research (MCCA; Netherlands Cancer Insti- 
tute and the European Research Institute for the Biology of 
Aging, http://www.mccanet.nl/); Center for Co-Clinical Trials 
(MD Anderson, http://www.cancermoonshots.org/platforms/ 
center-for-co-clinical-trials/); and the Co-Clinical Project: In- 
forming Clinical Trials Using Preclinical Mouse Models (Harvard 
Medical School). Similar efforts focused specifically on pancre- 
atic ductal carcinoma include the Mouse Hospitals (Columbia 
Medical School, http://www.olivelab.org/mouse-hospital.html, 
and Cold Spring Harbor Laboratories). 

Emerging and Future Prospects 

This PRIMER focuses on the attributes and limitations of murine 
cancer models that currently best emulate our existing under- 
standing of human cancers, an ever-expanding awareness of 
which is required to drive development of effective preclinical 
platforms. The high cost and low yield of efficacious therapies, 
despite clinical evaluation of countless potential therapeutics, 
motivate the use and development of preclinical PDX and GEM 
in the guidance of clinical research. Ultimately, collective 
employment of a variety of model systems will likely be required 
to successfully impact clinical outcomes. 



Optimal mouse studies are sufficiently cumbersome so as to 
preclude the simultaneous evaluation of numerous drugs and 
unbiased libraries; high-throughput in vitro screening systems 
are essential precursors to in vivo evaluations. Despite their lim- 
itations, cancer cell lines have proven valuable in uncovering 
mechanisms of acquired drug resistance for in vitro drug screens 
(Torrance et al., 2001), and several technologies such as RNAi 
and CRISPR/Cas9 methods have enhanced their versatility 
(Corcoran et al., 2013; Shalem et al., 2014). However, cancer 
cell-line screens identify only drugs that target intrinsic cancer 
cell functions. Targeting tumor stroma or microenvironment/ 
tumor cell interactions requires the use of in vitro systems that 
approximate the composition of cancers that preserve important 
cancer constituents, cell-cell interactions, and architectural 
features. To this end, several ex vivo platforms have been devel- 
oped, including spheroids, organoids, microtumors (tumor tis- 
sue in synthetic matrix), and tissue slices (Burdett et al., 2010; 
Mendoza et al., 2010; Yamada and Cukierman, 2007). While 
optimization and validation of emerging ex vivo models in drug 
screening is ongoing, many may be incorporated into early 
phases of drug development, resulting in efficient triage and 
increased success in vivo. 

In addition to ex vivo systems, non-murine whole organism 
drug screens have shown promise for early triage (Gao et al., 
201 4). Due to their relatively small size, low cost, and high fecun- 
dity, invertebrates such as flies (Drosophila) and nematodes 
(C. elegans) have shown promise. Furthermore, zebrafish (Danio 
rerio) are particularly well suited for high throughput screens 
because of rapid extra-uterine development, embryonic trans- 
parency, and recently developed pigment deficiency to facilitate 
imaging (Barriuso et al., 2015). Using automated high content 
and high throughput platforms, zebrafish can be used for chem- 
ical, genetic, and pathway-based screens (Lieschke and Currie, 
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2007). Notably, data generated from zebrafish models have been 
used in clinical trials. For example, the pyrimidine biosynthesis 
enzyme DHODH was identified in zebrafish screens as a novel 
melanoma drug target, and a clinical trial is underway in which 
patients are being treated with the DHODH inhibitor leflunomide 
(Hagedorn et al., 2014; White et al., 2011). Zebrafish have also 
been used as hosts for human and mouse xenografts to monitor 
invasiveness, angiogenesis, and drug responses in real time 
(Zhang et al., 2015). However, as with xenotransplantation of 
human cells into mice, inappropriate tumor-host interactions 
could limit the relevance and translational value of fish models. 

Optimization of preclinical models that can impact clinical 
practice will require overcoming challenges in several arenas 
(Table 3). However, achieving this goal will undoubtedly require 
expansion and integration of organized efforts by many factions. 
The sophistication of such preclinical studies requires expertise 
in many disparate fields and necessitates involvement of scientists 
in the public sector, who often possess critical expertise 
and mechanisms not available in the private sector. However, 
communication and data-sharing among investigators and organi- 
zations, though essential for efficient optimization of effective pre- 
clinical standard operating procedures, are limited. A future prior- 
ity will be to develop interactive web-based systems to house and 
mine experimental databases and SOPs for community sharing. 
Such organized initiatives will begin to meet the significant and im- 
mediate need to revolutionize the accuracy of preclinical assess- 
ment and to develop and utilize PDX- and GEM-based disease 
models in research to increase the number of effective treatments 
reaching clinical trials and thus, cancer patients. 

I n summary, we now have a wealth of model systems that show 
early promise in establishing robust preclinical assessment plat- 
forms for improving clinical success. Each system has specific 
and sometimes unique value, and all will undoubtedly play a sig- 
nificant role in varied aspects of future preclinical studies. At this 
junction, systematic comparisons in the prediction of human out- 
comes by distinct model systems has not been carried out and is 
needed in order to construct sound preclinical operating princi- 
ples. The selection of models for a given study will undoubtedly 
depend on the required purpose. While 2D cell cultures are useful 
for identifying cancer cell-intrinsic vulnerabilities, 3D ex vivo 
methods incorporate assessment of multicellular interactions. 
Non-mammalian animals further offer reasonable throughput in 
complex biological systems, while PDX and GEM models provide 
the best representation of tumor microenvironments, physiolog- 
ical responses, and disease pathology. GEMs further allow for 
evaluation of immune system interventions and of responses 
unique to in situ developed disease. Ultimately, the complemen- 
tary use of many of these models and continual efforts to improve 
their effectiveness will propel preclinical studies to a new era of 
cancer therapeutics development. This is a uniquely exciting 
era wherein preclinical models, rather than serving simply to 
confirm clinical outcomes, have the potential to routinely fuel 
optimized clinical success. 

SUPPLEMENTAL INFORMATION 

Supplemental Information can be found with this file at http://dx.doi.org/10. 
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SUMMARY 

Radial glia, the neural stem cells of the neocortex, are 
located in two niches: the ventricular zone and outer 
subventricular zone. Although outer subventricular 
zone radial glia may generate the majority of human 
cortical neurons, their molecular features remain 
elusive. By analyzing gene expression across single 
cells, we find that outer radial glia preferentially ex- 
press genes related to extracellular matrix formation, 
migration, and sternness, including TNC, PTPRZ1, 
FAM107A, HOPX, and LIFR. Using dynamic imaging, 
immunostaining, and clonal analysis, we relate these 
molecular features to distinctive behaviors of outer 
radial glia, demonstrate the necessity of STAT3 
signaling for their cell cycle progression, and estab- 
lish their extensive proliferative potential. These re- 
sults suggest that outer radial glia directly support 
the subventricular niche through local production of 
growth factors, potentiation of growth factor signals 
by extracellular matrix proteins, and activation of 
self-renewal pathways, thereby enabling the devel- 
opmental and evolutionary expansion of the human 
neocortex. 

INTRODUCTION 

The human neocortex contains 16 billion neurons of diverse 
types that develop from an initially uniform neuroepithelium. In 
the ventricular zone (VZ), radial glia undergo interkinetic nuclear 
migration and possess apical processes that contact the 
ventricle and form adherens junctions. Apical complex proteins 
transduce signals from the cerebrospinal fluid that are critical 
for the survival, proliferation, and neurogenic capacity of ventric- 
ular radial glia (vRG) (Lehtinen et al., 201 1). However, the majority 
of human radial glia are located in the outer subventricular zone 
(OSVZ) (Lewitus et al., 2013). These outer radial glia (oRG) retain 
basal processes but lack apical junctions and undergo a distinct 

CrossMark 



migratory behavior, mitotic somal translocation, directly preced- 
ing cell division (Hansen et al., 2010). Thus, vRG and oRG cells 
reside in distinct niches defined by differences in anatomical 
location, provision of growth factors, cell morphology, and 
behavior (Fietz et al., 2010). Although oRG cells may generate 
the majority of cortical neurons (Lewitus et al., 2013; Smart 
et al., 2002), the molecular features sustaining neural stem cell 
properties of oRG cells in the OSVZ niche are largely unknown 
and the long-term proliferative capacity of these cells has not 
been examined. 

Understanding the molecular programs specifically employed 
by oRG cells would provide insights into mechanisms of cortical 
development and support strategies to generate this cell type 
in vitro. Previous studies have attempted to identify genes 
uniquely expressed in oRG cells using a variety of transcriptional 
profiling strategies, including comparisons between microdis- 
sected samples (Fietz et al., 2012; Miller et al., 2014) and be- 
tween cell populations expressing particular surface proteins 
(Florio et al., 2015; Johnson et al., 2015), but the difficulty of 
isolating bona fide oRG cells has made clear definition of their 
gene expression profiles challenging. 

To specifically compare molecular features of radial glia in the 
VZ and OSVZ, we performed RNA sequencing of single cells 
captured from these two zones without additional enrichment 
steps. We then classified cells by analyzing thousands of genes 
that vary across cells and isolated radial glia from other cell types 
in silico (Pollen et al., 201 4). We find that the proneural gene net- 
works recently attributed to oRG cells are largely restricted to 
intermediate progenitor cells. Within classically defined radial 
glia, we discover molecular distinctions between vRG and oRG 
cells. The transcriptional state enriched in oRG cells includes 
genes involved in extracellular matrix production, epithelial-to- 
mesenchymal transition, and stem cell maintenance. Surpris- 
ingly, we find that components of the LIFR/STAT3 self-renewal 
pathway are selectively expressed by oRG but not vRG cells, 
and we confirm that STAT3 signaling is necessary for oRG cell- 
cycle progression. We further find that single oRG cells have 
the capacity to produce hundreds of deep and upper cortical 
layer neurons. Based on these results, we propose that oRG 
cells directly support the development of an enlarged OSVZ 
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neural stem cell niche through the local production of growth 
factors, the expression of extracellular matrix proteins that 
potentiate growth factor signaling, and the activation of the 
LIFR/STAT3 signaling pathway. 

RESULTS 

Molecular Diversity of Cells in the Cortical Germinal 
Zones 

To analyze molecular features of cells in the germinal zones dur- 
ing human cortical neurogenesis, we captured single cells from 
microdissected VZ and SVZ specimens of human cortex at 
gestational weeks 16-18 (GW1 6-1 8) and generated sequencing 
libraries (schematic Figure 1A). We subsequently analyzed 393 
single cells from three individuals in which we detected at least 
1 ,000 genes (Table SI). To classify cells, we performed principal 
component analysis (PCA) and used expectation-maximization 
clustering to group cells based on their position in PC space (Fig- 
ure SI and Experimental Procedures). Based on the expression 
of known marker genes, we interpreted groups to represent cells 
along the cortical excitatory lineage and inhibitory interneurons 
generated in the ventral telencephalon (Figures 1 B-1 D and SI 
and Table S2). 

We further examined groups of cells expressing known 
markers of the cortical excitatory neuron lineage (schematic, 
Figure 2A). Four groups robustly expressed markers of human 
radial glia SLC1A3, PAX6, SOX2, PDGFD, and GLI3 (yellow 



Figure 1 . Molecular Diversity of Single Cells 
from Human Cortical Germinal Zone 

(A) Schematic representation of major cell pop- 
ulations of developing cortex. VZ, ventricular 
zone; SVZ, subventricular zone; IZ, intermediate 
zone; SP, subplate; CP, cortical plate; MZ, 
marginal zone. 

(B) Representation of transcriptional heterogeneity 
of germinal zone cells profiled by single-cell 
mRNA-seq. Cells are arranged according to their 
position determined using t-distributed stochastic 
neighbor embedding. 

(C) Heatmap showing gene expression levels for 
1 % of genes most strongly contributing to PCI -4. 
Select marker genes are highlighted. Groups 
represent clusters with highest approximately 
unbiased p values following multiscale boot- 
strapping of hierarchical clustering based on 
expectation-maximization cluster assignments 
(see also Figure SI). 

(D) Interpretation of distinct cortical and ventral 
telencephalic lineages detected among germinal 
zone cells. 



bar, Figure 2A). Another four groups re- 
tained a reduced level of PAX6, SOX2, 
and SLC1A3 expression, but also ex- 
pressed early neuronal markers such as 
STMN2 and NEUROD6. These groups 
were also characterized by the absence 
of canonical radial glia marker expres- 
sion, including VIM and HES1 , and by 
the specific expression of canonical and novel intermediate 
progenitor markers EOMES ( TBR2 ), ELAVL4, NEUROG1, 
NEUROD1, NEUROD4, PPP1R17, and PENK (magenta bar, Fig- 
ures 2A, 2B, and SI) (Flevner et al., 2006; Kawaguchi et al., 2008). 
We found that the vast majority of cells expressing EOMES and 
PPP1R17 mRNA also expressed EOMES protein with variable 
SOX2 expression (Figure S2). Immunostaining for PPP1R17 
revealed diverse morphologies of these cells, including multi- 
polar cells with short processes, as well as unipolar and bipolar 
cells with one or two radially or tangentially oriented processes. 
Regardless of morphology, these progenitor cells did not ex- 
press the classical molecular signature of radial glia (Figure S2). 
Thus, our analysis provides a clear distinction between radial 
glia and intermediate progenitor cells. Future studies may relate 
the molecular heterogeneity of intermediate progenitors and 
the relative expression of radial glial and proneural genes 
to the diverse and dynamic morphological features reported in 
OSVZ progenitors (Betizeau et al., 2013). 

Major Sources of Transcriptional Variation among 
Radial Glia Relate to Cell Cycle and Stem Cell Niche 

We next analyzed variation in gene expression across 107 cells 
from the four groups that robustly expressed canonical markers 
of radial glia. We anticipated that cell cycle would be the major 
source of transcriptional variation across proliferative popula- 
tions (Pollen et al., 2014). Indeed, genes involved in cell-cycle 
regulation, mitosis, and DNA replication explained most variation 
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Figure 2. Major Sources of Transcriptional 
Heterogeneity among Radial Glia 

(A) Analysis of cell-type identity of in-silico-sorted 
cortical lineage cells (schematic); heatmap illus- 
trates expression of identity genes across cells. 
Colored bars below heatmap highlight member- 
ship of cells into major groups (Figure SI). 

(B) Interpretation of sequential gene expression 
changes during cortical excitatory neuron differ- 
entiation (see also Figure S2). 

(C) Schematic illustrating in silico sorting of clas- 
sically defined radial glia from the set of cortical 
lineage cells. Radial glia themselves display het- 
erogeneity with respect to cell-cycle progression, 
morphology, anatomical position, and behavior, 
and PCA reveals major sources of transcriptional 
variation among radial glia. Histogram of PCA 
gene loading scores with gene ontology enrich- 
ments highlighted (adjusted p value, Fisher’s 
exact test). 

(D) Interpretation of cell-cycle phases for radial glia 
based on clustering according to sample scores 
along PCI , 2, and 4 (see also Figure S3). 

(E) Schematic of the anatomical sources of radial 
glia and violin plots illustrate prediction that VZ and 
adjacent inner SVZ contain a mixed population of 
vRG and oRG cells. 

(F) Histogram of VZ and OSVZ radial glia sample 
scores and gene loading scores along PC3. Radial 
glia from OSVZ (n = 39) have significantly higher 
PC3 scores than radial glia from VZ (n = 68) across 
3 samples (p < 1 0 4 , Welch t test, see also Fig- 
ure S3). Violin plots show distribution of expres- 
sion values for strongly loading genes across cells 
from VZ and OSVZ sources. 

(G) Interpretation of oRG and vRG identity based 
on clustering cells according to top 1 % of genes 
loading PC3. Schematic highlights the grouping of 
radial glia by inferred cell type rather than by 
anatomical source. Violin plots show distribution 
of expression values within inferred cell types. 



along PCI, PC2, and PC4 (Figure 2C and Table S3). Clustering 
radial glia based on variation along these axes revealed cell 
groups representing G1, G1/S checkpoint, and G2/M check- 
point (Figures 2D and S3). During interkinetic nuclear migration, 
the cell bodies of radial glia migrate away from the ventricle dur- 
ing G1 , toward the ventricle during S/G2 phase, and divide at the 
ventricular surface. We examined the expression of a novel pre- 
dicted G1 marker, CRYAB, and found that cells expressing 
CRYAB transcript were displaced from the ventricle, rarely ex- 
pressed the G2/M marker phospho-histone H3, and rarely incor- 
porated BrdU, a label for DNA replication, consistent with the G1 
specificity of this transcript (Figure S3). Thus, differentiation and 
cell cycle are major sources of transcriptional heterogeneity 
among cells in the germinal zone, and single-cell analysis reveals 
novel molecular features of these states. 

We hypothesized that differences in stem cell niche occu- 
pancy would also contribute to variation among radial glia. De- 
scriptions of cell behavior and morphology suggest that the VZ 
and adjacent inner SVZ contain mixed populations of vRG cells 
and oRG cells destined to migrate to the OSVZ (Reillo et al., 



2011) (Figure 2E). In contrast, the OSVZ contains a more pure 
population of oRG cells that lack apical processes (Fietz et al., 
2010; Hansen et al., 2010), although the morphology of these 
cells may also be dynamic and diverse (Betizeau et al., 2013; 
Gertz et al., 201 4). We found that the anatomical source of radial 
glia was significantly associated with the position of cells along 
PC3 (Figures 2F and S3). Indeed, many of the genes with strong 
positive and negative loading scores along PC3 showed differ- 
ential expression patterns between radial glia collected from 
VZ and SVZ (Figure 2F). By clustering radial glia based on the 
1 % of genes most strongly loading PC3, we identified two tran- 
scriptionally distinct groups, one almost purely composed of 
cells from VZ that we interpreted as vRG cells and another 
composed of cells from both VZ and SVZ that we interpreted 
as oRG cells (Figure 2G). 

Predicted Markers Relate to Position, Morphology, 
and Behavior of oRG Cells 

To relate these distinct transcriptional states to the stem cell 
niches of the developing neocortex, we first searched for genes 
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Figure 3. Candidate oRG Markers Relate to Position, Morphology, and Behavior of Cell Type 

(A) Scatterplot highlighting specificity of genes to inferred radial glia subpopulations. Specificity is calculated by Pearson correlation with an idealized marker 
gene expressed only in candidate oRG cells (y axis) or candidate vRG cells (x axis). Orange, green, and yellow boxes highlight genes with predicted specificity for 
oRG, vRG, and all radial glia cells (pan-RG), respectively (see also Figure S3). 

(B) Left heatmap showing expression of oRG, vRG, and pan-RG genes across inferred cell types and their average expression in microdissected cortical tissue 
samples (right heatmap) (Miller et al., 2014). 

(C) Similarity matrix of oRG- and vRG-specific gene expression levels across radial glia cells. 

(D) Representative examples of in situ hybridization staining for candidate vRG and oRG markers in human cortical tissue sections at GW1 8.2. Inset shows higher 
magnification of positively stained region (scale bar, 25 [im). 

(legend continued on next page) 
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likely to distinguish predicted radial glia subtypes. We measured 
the specificity of genes by their correlation with an ideal marker 
gene uniformly expressed in only one putative radial glia subpop- 
ulation across all 393 cells (Figures 3A and S3). We identified 67 
candidate marker genes strongly correlated with the oRG popu- 
lation, 33 candidate genes strongly correlated with the vRG pop- 
ulation, and 31 genes strongly correlated with both radial glia 
populations (Figure 3A, green, orange, and yellow boxes, respec- 
tively; Table S3). In support of these predictions, we observed 
that candidate vRG markers showed higher expression in the 
VZ, whereas candidate oRG markers showed higher expression 
in the SVZ across tissue samples collected from developing hu- 
man cortex, and their expression levels were inversely correlated 
across radial glia cells (Figures 3B and 3C) (Miller et al., 2014). 

To further investigate candidate marker genes, we performed 
in situ hybridization at GW17-19, stages of peak neurogenesis. 
We found that expression of vRG candidates, CRYAB , PDGFD , 
TAGLN2, FBX032, and PALLD, was strongest in the VZ, 
whereas expression of oRG candidates, HOPX, PTPRZ1 , TNC, 
F AMI 07 A, and MOXD1 , was strongest in the OSVZ (Figure 3D). 
Quantification revealed that the vast majority of cells expressing 
TNC, PTPRZ1 , or HOPX also expressed the radial glia marker 
SOX2, but not the intermediate progenitor marker EOMES or 
neuronal marker SATB2 (Figures 3E and S4). Immunostaining re- 
vealed expression of HOPX, PTPRZ1 , and TNC proteins in cells 
with basal fibers that lacked EOMES expression, linking this mo- 
lecular identity to the typical morphology of oRG cells (Figure 3F). 
To next relate this molecular identity to distinctive oRG behav- 
iors, we performed time-lapse imaging of organotypic cortical 
slices from GW15, GW17, GW18, GW18.7, and GW19.5 speci- 
mens infected with GFP-expressing adenovirus and then exam- 
ined the expression of the most specific oRG marker, HOPX. We 
observed that cells undergoing mitotic somal translocation 
behavior can generate SOX2/HOPX double-positive daughter 
cells with long basal processes characteristic of oRG cells 
throughout neurogenesis (Figures 3G and S4). Together, these 
results connect the molecular identity determined from single- 
cell RNA sequencing to the anatomical location, morphology, 
and behavior of oRG cells. 

Specific Expression of Sternness Pathways Suggests 
Mechanism for Maintaining the OSVZ Stem Cell Niche 

Because oRG cells lack direct access to trophic factors distrib- 
uted by the cerebrospinal fluid (Lehtinen et al., 2011), we 
explored whether genes enriched in oRG cells relate to known 
functional categories related to growth factor signaling (Fig- 
ure 4A). We found that protein products of many genes enriched 
in oRG cells mediate interactions with the extracellular matrix 
(Figures 4B-4D and Table S3). These proteins included TNC, 
PTPRZ1 , SDC3, HS6ST1, and ITGB5, which cooperate to pro- 



mote neural stem cell maturation by controlling local concentra- 
tions of fibroblast and epidermal growth factors (Barros et al., 
2011; Garwood et al., 2004; Milev et al., 1998; Szklarczyk 
et al., 2015). Furthermore, the PTPRZ1 ligand PTN was ex- 
pressed in both radial glia populations but was the most signifi- 
cantly upregulated gene in oRG cells (Table S4). We confirmed 
by immunostaining that TNC and PTPRZ1 are expressed in a 
subset of ITGB5-positive oRG cells (Figure 4E). Many of the 
cell-surface proteins enriched in oRG cells are also highly over- 
expressed in glioblastomas compared with normal human 
astrocytes, including TNC, PTPRZ1 , and LGALS3BP (Autelitano 
et al., 2014; Nie et al., 2015), and PTN stimulation of PTPRZ1 is 
sufficient to stimulate the coordinated processes of epithelial- 
to-mesenchymal transition in a glioblastoma cell line (Perez- 
Pinera et al., 2006). Therefore, we investigated whether human 
glioblastoma samples contain cell populations that co-express 
oRG markers. In a recent study of five primary glioblastoma sam- 
ples, PTN, PTPRZ1 , and FABP7 were the genes most correlated 
with a sternness signature across single tumor cells (Patel et al., 
2014). We extended this analysis to all predicted oRG and vRG 
markers (Figure 4F) and found that oRG markers were more 
highly correlated with both the sternness and general radial 
glia signatures than vRG markers (Figure 4G), suggesting that 
oRG-enriched genes are associated with sternness pathways 
and growth factor signaling in both the developing OSVZ stem 
cell niche and in primary glioblastoma. 

We next explored whether additional signaling pathways were 
upregulated in oRG cells. We found that LIFR and its co-receptor 
IL6ST (GP130) were specifically expressed in oRG cells (Figures 
4H and 41). LIF signaling through LIFR and IL6ST phosphorylates 
STAT3 at tyrosine 705 (p-Y705) to promote stem cell mainte- 
nance (Huang et al., 2014). Immunostaining revealed that p- 
Y705-STAT3 was specifically localized at the nuclei of oRG cells 
and not detected in other cell types (Figures 41 and S5). To inves- 
tigate the function of STAT3 in oRG cells, we pharmacologically 
blocked STAT3 phosphorylation in human slice cultures. After 
2 days, we observed a reduction in the proportion of oRG cells 
that incorporated bromodeoxyuridine (BrdU) (Figure 4J), which 
is consistent with the proposed role of phosphorylated STAT3 
in neural stem cell maintenance (Hong and Song, 2015; Huang 
et al., 2014). In addition, we found that expression of constitu- 
tively active Stat3 in developing mouse cortex increased the 
proportion of electroporated cells expressing Sox2, but not 
Eomes, compared with expression of only the fluorescent re- 
porter (Figure S5). Together, these results support a role for 
STAT3 (p-Y705) in maintaining sternness of human oRG cells. 

Extensive Neurogenic Capacity of oRG Cells 

We next examined the differentiation potential and proliferative 
capacity of human oRG cells (Figure 5A). Many oRG-enriched 



(E) In situ hybridization combined with immunostaining for identity markers. Black arrows highlight cells expressing marker transcript and SOX2, but not EOMES 
(see also Figure S4 containing further examples). Bar chart shows quantification of molecular identity of OSVZ cells positive for oRG marker mRNA. Values 
represent mean ± SEM; n = 3 biological replicates, GW16.7, GW18.2, and GW20. 

(F) Immunostaining of GW1 8 cortex for candidate oRG proteins and identity markers. White arrows highlight examples of immunopositive radial glia with staining 
in basal fiber. Yellow arrows indicate varicosities of the basal fiber of an oRG cell infected with adenovirus-GFP, with TNC immunostaining in the cell body. 

(G) Time-lapse imaging of oRG cells undergoing MST and post-staining for HOPX, SOX2, and GFP. Yellow arrows highlight GFP-expressing cells. 

See also Figure S4. 
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Figure 4. Pathways Enriched in oRG Cells Relate to Growth Factor Signaling and Maintenance of Proliferation 

(A) Schema: oRG cells lack apical processes that transduce signals mediated by trophic factors in ventricular cerebrospinal fluid (CSF). 

(B) Heatmap of extracellular matrix gene expression across inferred cell types. Orange bar highlights genes with high oRG specificity across all cells, and purple 
bar highlights genes differentially expressed by DESeq. 

(C) Functional protein association network generated using String-db (Szklarczyk et al., 2015). 

(D) Model of TNC protein, interaction partners, and downstream pathways. 

(E) Immunolabeling of human OSVZ tissue sections reveals expression of ITGB5 by radial glia, including proliferating KI67+ and KI67- (top row, examples 
indicated with arrows), but not EOMES+ intermediate progenitors (yellow arrowheads denote PAX6+, EOMES+ intermediate progenitor cells, which do not 
express ITGB5, see also Figure S2). Top right panels show immunolabeling for ITGB5 and proliferation marker KI67 (white arrows indicate proliferating radial glia 
expressing ITGB5, and yellow arrow indicates non-actively proliferating radial glia expressing ITGB5). Bottom left rows show expression of TNC and PTPRZ1 in a 
subset of ITGB5+ radial glia (examples indicated with arrowheads). Bar charts represent quantification (mean ± SEM) across three biological replicates between 
GW16.5 and GW18. 

(F) Heatmap of oRG, vRG, pan-RG, and glioblastoma multiforme (GBM) sternness gene expression signatures across 598 cells from five primary GBM tumors 

(Patel et al., 2014). 

(legend continued on next page) 
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Figure 5. Extensive Proliferative and Neurogenic Capacity of oRG Cells 

(A) Schema representing progenitor cell competence of radial glia, oRG cells, and glial progenitors (see also Figure S5). 

(B) In situ hybridization for genes expressed by oRG and other cell types but depleted in vRG cells and immunolabeling of GW1 8 human cortical sections (see also 
Figure S6). Black arrows indicate examples of mRNA expressing oRG cells. 

(C) Experimental design for single-cell clonal lineage analysis of oRG cells labeled with retrovirus GFP (rvGFP). 

(D) Image and magnification of a well containing a single purified GFP-positive cell. Time-lapse imaging of single cell with arrow highlighting MST preceding first 
division. 

(E) Images show movie frames capturing the initial division and a resulting clone after 7 days in culture of a cell exhibiting oRG specific mitotic somal translocation 
(left) and a cell undergoing an initial stationary division with fiber retraction characteristic of intermediate progenitors and a resulting clone after 7 days in culture 
(right). Chart shows clone sizes at 7 days for cells classified as oRGs and I PCs based on first division and morphology. 

(F) Immunostaining for neuronal or glial markers of 3 oRG cell clones after 50 days in vitro. Bar chart represents quantification of the average clone size of oRG 
cells. 

See also Figure S6. 



genes are also associated with astrocytes later in development, 
including TNC, ITGB5, DI02, and ACSBG1 (Cahoy et al., 2008), 
and LIFR signaling through STAT3 can promote gliogenesis in 
combination with BMP signaling (Bonaguidi et al., 2005). How- 
ever, oRG cells did not express other astrocyte markers such 
as AQP4, CA2, IL33, and ALDOC, which we observed in putative 
astrocytes later in development (Figure S5). In addition, oRG 
cells strongly expressed NOG (Figure 4H), which inhibits BMP 
signaling and promotes neurogenesis in GFAP-expressing pro- 



genitors (Bonaguidi et al., 2005; Bonaguidi et al., 2008; Lim 
et al., 2000). We further noted that many genes upregulated in 
oRG cells relative to vRG cells were also strongly expressed 
by cortical neurons such as NPY, RTN1, CTNND2, SEZ6L, and 
NRCAM (Figures 5B and S6), suggesting a relationship between 
neurons and oRG cells. 

To further examine the neurogenic potential of oRG cells, we 
isolated single cells from the germinal zone by fluorescence-acti- 
vated cell sorting (FACS) and cultured them on a feeder cell layer 



(G) Bar graphs show that the oRG signature has the strongest correlation with the GBM sternness signature and with the pan-RG signature across GBM tumor 
samples, p values report significance of difference between standardized correlation coefficients; error bars represent 95% confidence interval. 

(H) Heatmap showing average expression of selected genes across inferred cell types and validation of LIFR expression in oRG cells by in situ hybridization 
(examples indicated by arrows). 

(I) Immunostaining of human tissue section for phosphorylated STAT3 (p-Y705) and SOX2 (see also Figure S5). Top bar graph represents quantification of 
nuclear immunostaining for phosphorylated STAT3 across germinal zone depth starting from ventricular edge (0.0) to the basal edge of the OSVZ (1 .0). Bottom 
graph represents the molecular identity of STAT3+ Y705+ cells quantified in the OSVZ of three biological replicates between GW1 6.5 and GW1 8. Data represent 
mean ± SEM. 

(J) Inhibition of STAT3 phosphorylation in organotypic OSVZ slice cultures reduces BrdU incorporation by radial glia. Images show representative examples of 
immunostained experimental sections. Bar graphs display quantification of BrdU incorporation by radial glia and intermediate progenitors in presence of STAT3 
phosphorylation inhibitors or control DMSO (n = 4, *p < 0.05, paired Student’s t test, error bars represent SEM). Arrows indicate examples of cells expressing 
SOX2 but not EOMES or SATB2 that did not incorporate BrdU over 48 hr period in inhibitor-treated slices. 
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Figure 6. Molecular Signature of oRG Cells Emerges in VZ 

(A) In situ hybridization for genes enriched in oRG cells across multiple developmental time points corresponding to human cortical neurogenesis. Sections were 
stained with antibodies against SOX2, EOMES, and SATB2 to identify major cell populations (see also Figures S4 and S7). 

(B) The proportion of oRG marker cells that express all combinations of protein markers was quantified across ten bins that span the germinal zone throughout 
neurogenesis. In all cases, oRG marker transcripts are almost exclusively expressed by SOX2+/EOMES-/SATB2- cells (black bars). Bar graphs highlight that, at 
GW13.5, oRG markers predominantly label radial glia in the VZ, but by GW14.5, oRG markers predominantly label radial glia cells in the SVZ. 

(legend continued on next page) 
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(Figures 5C and 5D). Using time-lapse imaging, we followed indi- 
vidual cells for 1 week in vitro. Cells that displayed the distinctive 
somal translocation directly preceding the first division were 
classified as oRGs, whereas cells that retracted processes and 
underwent stationary first division were classified as intermedi- 
ate progenitors. Consistent with their stem-like character, single 
oRG cells gave rise to larger clones than intermediate progenitor 
cells (Figure 5E). We followed three oRG clones in vitro for an 
additional 6 weeks, and they generated hundreds of daughter 
cells, including deep and upper layer neurons, as well as glial 
cells (Figures 5F and 86). Thus, human oRG cells express self- 
renewal pathways not detected in vRG cells during peak upper 
cortical layer neurogenesis, generate diverse neural daughter 
cells, and have the capacity for extensive proliferation. 

oRG Signature Emerges in VZ and Is Conserved in 
Primates 

To further investigate the developmental and evolutionary origin 
of the transcriptional signature of oRG cells, we examined the 
expression of oRG marker genes across different stages of 
human corticogenesis and in different species. Surprisingly, we 
observed strong expression of transcripts encoding oRG marker 
genes in the VZ at early stages of human cortical development. 
This expression became progressively restricted to the OSVZ 
at later stages, with F AMI 07 A persisting the longest in the VZ 
(Figures 6A, 6B, and S 7). TNC, FAM107A, and PTPRZ1 protein 
expression followed a similar developmental progression, but 
TNC and PTPRZ1 antibodies only labeled subsets of radial glia 
cells in the VZ (Figure 6C). The coordinated expression of these 
genes in the VZ around GW1 3.5 coincides with the elaboration of 
the OSVZ (Bayatti et al., 2008; Martinez-Cerdeno et al., 2012; 
Shitamukai et al., 201 1 ; Zecevic et al., 2005) and may represent 
a molecular program involved in oRG specification. 

During mouse development, oRG cells are rare and there is not 
a distinct OSVZ. To investigate whether human oRG markers are 
expressed in mouse radial glia, we used recently published gene 
coexpression networks (Lui et al., 2014) and found that, on 
average, human oRG markers are less likely to show specific 
expression in mouse radial glia than general radial glia markers 
(Figure 7A). Nonetheless, TNC expression has previously been 
detected in radial glia of the mouse ventral pallium (Garcion 
et al., 2004; Gotz et al., 1 998; Wiese et al., 201 2). Immunostaining 
for TNC and PTPRZ1 revealed that both proteins are most 
strongly detected in the VZ and SVZ of the lateral and ventral 
pallium (Figure 7B), where mouse oRG cells are most common 
(Wang et al., 201 1 b). Closer examination of SOX2-positive cells 
in the SVZ revealed examples of putative mouse oRG cells ex- 
pressing TNC and PTPRZ1 (Figure 7B, insets). More widespread 
transcription of TNC, PTPRZ1 , and HOPX throughout the mouse 
cortical VZ coincides with the conclusion of cortical neurogene- 
sis (Figure S7). Thus, conserved elements of the oRG signature 
may reflect regional and temporal heterogeneity of mouse 
radial glia, but many genes enriched in human oRG cells are 



not expressed in mouse radial glia, including MOXD1 (Wang 
et al., 2011a), FAM107A, FBN2, BMP7, HS6ST2, LGALS3, and 
TKTL1 (Figure S7). In contrast to mouse, the developing ma- 
caque cortex contains a large OSVZ region and prominent 
oRG population (Betizeau et al., 2013; Smart et al., 2002). Using 
microarray data from developing macaque cortex, we found that 
the expression of oRG marker genes in macaque development 
mirrors that of human development (Figure 7C). We confirmed 
this pattern for select oRG marker genes by analyzing primary 
tissue samples. We detected expression of TNC, PTPRZ1 , and 
HOPX in macaque VZ early in development but found that 
OSVZ expression of these markers, along with FAM107A, 
predominates at later stages of corticogenesis (Figure 7D). 
Together, our data indicate that major elements of the transcrip- 
tional signature of human oRG cells are conserved in primates. 

DISCUSSION 

Our study identifies neuronal differentiation, cell-cycle progres- 
sion, and anatomical position as major sources of transcriptional 
variation across single cells sampled from germinal niches of 
the developing human cortex. The transcriptional state associ- 
ated with neuronal differentiation involves reduced expression 
of classical radial glia markers such as VIM and FIES1 and upre- 
gulation of proneural transcription factors such as NEUROG1 , 
NEUROD4, and EOMES and neuropeptide signaling genes 
PENK, SSTR2, and OXTR. This transcriptional state was recently 
attributed to heterogeneity among oRG cells (Johnson et al., 
201 5). However, based on expression of mRNA, EOMES protein, 
and the novel marker PPP1 R1 7, which reveals diverse multipolar 
morphologies, we interpret this transcriptional state to represent 
intermediate progenitor cells. 

In contrast, we identify a novel transcriptional state, indepen- 
dent of neuronal differentiation, that distinguishes oRG from vRG 
cells by analyzing the major sources of variation among classi- 
cally defined radial glia. We relate this transcriptional state to 
the position, morphology, and dynamic behavior of oRG cells. 
Together, this multimodal characterization establishes an inte- 
grative identity for oRG cells. These neural stem cells are charac- 
terized by the expression of novel markers, including HOPX, 
TNC, ITGB5, as well as pan-radial glia markers such as VIM, 
HES1 , and ATP1 A2; the presence of a basal, but not apical fiber; 
mitotic-somal translocation behavior; and extensive proliferative 
and neurogenic capacity. This cell type is most abundant in the 
OSVZ stem cell niche for which it was named but also resides 
in the inner SVZ and VZ, and the transcriptional state first 
emerges in the VZ during early cortical neurogenesis. The oRG 
marker genes may enable the construction of molecular tools 
for selectively visualizing, manipulating, or purifying oRG cells 
in tissue and for evaluating the identity of human cortical progen- 
itor cells generated from pluripotent stem cells. In addition, these 
genes may provide insights into the cell types affected in neuro- 
developmental disorders. 



(C) The proportion of SOX2+/EOMES-/SATB2- that expressed TNC was quantified across the span of the germinal zone at GW13.5 and GW18. 

(D) Immunolabeling of early developmental time points for oRG markers and SOX2. At GW13.5, TNC and PTPRZ1 protein show limited expression in radial glia 
close to the ventricular edge, whereas FAM107A protein is widely expressed in GW13.5 VZ. 

See also Figure S7. 



Cell 163, 55-67, September 24, 2015 ©2015 Elsevier Inc. 63 




Cell 



A 




0 < n 
a: q) 

I c 

£ 

CD O) 
Q_ 

C/) 

o© 

° CD 



0 0 ) 
^ <D 



Human 

specific 



Mouse 

specific 




Differential 
percentile rank 




TNC 



TNC 



PTPRZ1 PTPRZ1 

SOX2 SOX2/EOMES 




lOOpm 




FAM107A - E85 


TNC - E110 


PTPRZ1 - E110 


FAM107A- E110 




f ' 


e 




/ 


I 







lOOpm 




Outer Radial Glia 




Trasmembrane 
proteins and receptors 

1 LIFR 

f GP130/IL6ST 
• 1 

| PTPRZ1 

f ITGB5 

§ FGFR 

^=-HS (C 6 -0S0 3 ) 

| SDC3 



Extracellular 

proteins 

-J^TNC 

• PTN 

• LIF 

• FGF 



Intracellular 

pathways 

8 STAT3/p-Y705 

S STAT3 



Figure 7. Conservation of oRG Marker Gene Expression in Primates 

(A) Bar graph indicates average differential percentile rank of pan-RG, oRG, and vRG genes for radial glia gene coexpression network specificity in human and 
mouse. Compared with pan-RG genes (n = 29), human oRG genes (n = 66) show reduced conservation with mouse radial glia signature (p < 0.05, Wilcoxon rank 
sum test). 

(B) Mouse El 5.5 and El 8.5 cortical sections immunoreacted forTnc and Ptprzl along with Sox2 and Eomes. Inset images show a magnified view of SVZ region, 
and arrows highlight examples of Sox2-positive, Eomes-negative cells that co-label for Tnc or Ptprzl . LGE, lateral ganglionic eminence; Ctx, cortex. 

(C) Heatmapsshow average expression level of oRG, vRG, and pan-RG genes indistinct regions of developing macaque cortex. IVZ, inner VZ; OVZ, outer VZ; the 
NIH Blueprint Non-Human Primate (NHP) Atlas. In situ hybridization of macaque cortex showing expression of oRG marker genes mirrors human trajectory. 

(D) Radial glia in small rodent brains are concentrated along the ventricle and access cerebrospinal fluid trophic factors directly via apical processes. In contrast, 
large primate brains contain numerous radial glia in the OSVZ. Local production of growth factors by radial glia may provide additional trophic support to oRG 
cells in the OSVZ niche. 

(E) Increased expression of extracellular matrix proteins that potentiate growth factor signaling and activation of LIFR/STAT3 pathway may further maintain 
sternness in oRG cells. 

See also Figure S7. 



Beyond simply marking oRG cells, the genes we identify 
belong to common pathways that suggest mechanisms by 
which human oRG cells actively maintain the OSVZ as a neural 
stem cell niche. Many of these genes promote growth factor 
signaling, including TNC, PTPRZ1 , ITGB5, SDC3, HS6ST1 , 



IL6ST, and LIFR (Sim et al., 2006; Wiese et al., 2012). For 

example, TNC potentiates FGF signaling to support the matura- 
tion of neural stem cells (Garcion et al., 2004), whereas integrin 
signaling along the basal fiber promotes radial glia identity (Fietz 
et al., 2010). Interestingly, TNC contains EGF-like repeats and 
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multiple binding domains for PTPRZ1 , syndecans, integrins, and 
other cell-surface receptors (Besser et al., 2012; von Holst, 
2008). Thus, TNC expression in oRG cells may couple key pro- 
tein networks regulating growth factor signaling, migration, and 
self-renewal. In addition, LIFR/STAT3 signaling is known to 
maintain radial glia neural stem cell identity (Bonaguidi et al., 

2005) , and we show that p-Y705-STAT3 signaling is necessary 
for normal cell-cycle progression in oRG cells but is surprisingly 
absent in vRG cells. Although STAT3 signaling can also 
contribute to gliogenesis, we speculate that expression of 
NOG may inhibit gliogenesis as observed in the rodent adult 
neurogenic germinal zones (Bonaguidi et al., 2008; Lim et al., 
2000). We directly examined the neural stem cell properties of 
oRG cells using single-cell clonal analysis. We find that single 
oRG cells at mid-neurogenesis can generate clones of nearly 
1,000 neuronal and glial daughter cells. This highlights the 
remarkable proliferative capacity of human oRG cells compared 
to mouse radial glia that typically generate 1 0-1 00 daughter cells 
throughout the neurogenic period (Gao et al., 2014; Qian et al., 
2000; Vasistha et al., 2014). 

The cell behaviors that distinguish oRG from vRG cells— loss 
of adhesion, delamination, and rapid migratory bursts preceding 
cell division — have been compared to epithelial-to-mesen- 
chymal transition (Itoh et al., 2013). Recent work suggests that 
oRG and glioma cells both undergo myosin-ll-dependent migra- 
tory movements (Beadle et al., 2008; Ostrem et al., 2014). Inter- 
estingly, many of the genes and proteins we detected in oRG 
cells have been implicated in invasive migratory behavior, 
including genes expressed in the VZ when oRG cells first 
emerge. For example, TNC, ITGB5, and PTN/PTPRZ1 signaling 
promotes multiple aspects of epithelial-to-mesenchymal transi- 
tion (Bianchi et al., 2010; Katoh et al., 2013; Perez-Pinera et al., 

2006) , PRKCA is necessary for the upregulation of SNAI1 and 
downregulation of CLDN1 during these transitions (Kyuno 
et al., 2013), and FAM107A establishes focal adhesions and in- 
creases glioma invasiveness (Le et al., 2010). The expression 
of these genes suggests possible mechanisms by which oRG 
cells emerge from the VZ and undergo mitotic somal transloca- 
tion. More generally, we found the oRG transcriptional signature 
to be enriched in cells from primary glioblastoma and conserved 
in macaque, suggesting that the development of invasive 
glioblastoma and the evolutionary expansion of the OSVZ may 
recruit common sets of genes controlling migration and self- 
renewal. 

Sequencing of single-cell mRNA while retaining cell position 
information provides a general method for identifying distinct 
subpopulations whose molecular identity may relate to microen- 
vironment. Here, we explored variation in radial glia gene expres- 
sion while considering stem cell niche as a covariate. Our results 
revealed novel molecular features of neural stem cell populations 
previously distinguished only by cell behavior, morphology, and 
position. Together with recent findings (Fietz et al., 2012; Lui 
et al., 2014), these results highlight three mechanisms that 
may maintain sternness of the expanded oRG population in the 
OSVZ stem cell niche: local production of trophic factors such 
as PTN and BMP7 by radial glia, expression of extracellular 
matrix proteins that potentiate growth factor signaling, and acti- 
vation of the LIFR/p-STAT3 signaling pathway (Figures 7D and 



7E). Because the oRG population is thought to be responsible 
for the majority of human cortical neurogenesis and OSVZ size 
correlates with the evolutionary expansion of the brain, future 
studies can investigate the role of these genes in neurodevelop- 
mental disorders and cortical evolution. 

EXPERIMENTAL PROCEDURES 
Single-Cell Analysis 

Micro-dissected VZ and SVZ tissue samples were dissociated using Papain 
(Worthington). Single-cell sequencing libraries were generated using the C -i 
Single-Cell Auto Prep Integrated Fluidic Circuit (Fluidigm), the SMARTer Ultra 
Low RNA Kit (Clontech), and the Nextera XT DNA Sample Preparation Kit 
(lllumina). Reads were aligned using Tophat2, and the expression of RefSeq 
genes was quantified by the featureCounts routine. Gene expression values 
were normalized based on library size as counts per million reads (CPM). 
Libraries with fewer than 1 ,000 genes detected above 1 CPM were eliminated 
as outliers. Cells were assigned to groups using PCA and Expectation-Maximi- 
zation clustering, and groups were interpreted based on the expression of 
known marker genes and tissue validation. The specificity of genes to each 
group was determined using the Pearson’s correlation and confirmed with 
DESeq2. The expression of cell-type markers was evaluated in silico using 
the Allen Institute Prenatal LMD Microarray Atlas (Miller et al., 2014) and NIH 
Blueprint NHP Atlas, as well as human and mouse gene coexpression net- 
works (Lui et al., 2014) and single-cell glioblastoma data (Patel et al., 2014), 
and in tissue using immunohistochemistry and in situ hybridization as 
described in the Supplemental Experimental Procedures. 

STAT3 Signaling 

To examine the function of decreased STAT3 signaling in oRG cells, we 
cultured fetal cortical slices for 48 hr in the presence of inhibitors and then 
performed immunostaining. In utero electroporation was performed at El 3.5 
of a mutated form of STAT3, which mimics the Y705 phosphorylation state 
driven by the EFIa promoter (Addgene, 24983). Timed-pregnant Swiss- 
Webster mice were obtained from Simonsen Laboratories and maintained 
according to protocols approved by the UCSF Institutional Animal Care and 
Use Committee. 

Single-Cell Clonal Analysis 

Cells dissociated from cortical germinal zone were infected cells with a 
pNIT-GFP retrovirus for 2-4 hr, cultured on matrigel (BD Biosciences) for 
3 days in media containing DMEM (Invitrogen, 11965), 1% B-27 supplement 
(Invitrogen, 12587-010), 1% N-2 supplement (17502-048), and recombinant 
human FGF-basic (10 ng/ml, Peprotech, AF-100-18B). We then used FACS 
(ARIA, BD Biosciences) to sort GFP-expressing cells at 1 cell/well into 96- 
well plates pre-seeded with feeder cells. We used time-lapse microscopy to 
identify the mitotic behavior of the initial cell divisions for each clone. After 
1 week, the cells were cultured for 6 weeks in media without FGF. 
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SUMMARY 

c/s-regulatory changes play a central role in mor- 
phological divergence, yet the regulatory principles 
underlying emergence of human traits remain poorly 
understood. Here, we use epigenomic profiling from 
human and chimpanzee cranial neural crest cells 
to systematically and quantitatively annotate diver- 
gence of craniofacial c/s-regulatory landscapes. Epi- 
genomic divergence is often attributable to genetic 
variation within TF motifs at orthologous enhancers, 
with a novel motif being most predictive of activity 
biases. We explore properties of this c/s-regulatory 
change, revealing the role of particular retroele- 
ments, uncovering broad clusters of species-biased 
enhancers near genes associated with human facial 
variation, and demonstrating that c/s-regulatory 
divergence is linked to quantitative expression differ- 
ences of crucial neural crest regulators. Our work 
provides a wealth of candidates for future evolu- 
tionary studies and demonstrates the value of 
“cellular anthropology,” a strategy of using in-vitro- 
derived embryonic cell types to elucidate both 
fundamental and evolving mechanisms underlying 
morphological variation in higher primates. 

INTRODUCTION 

Since the discovery that the protein-coding regions of the 
genome remain largely conserved between humans and chim- 
panzees, it has long been postulated that morphological diver- 
gence between closely related species is driven principally 
through quantitative and spatiotemporal changes in gene ex- 
pression, mediated by alterations in c/s-regulatory elements 
(Carroll, 2008; King and Wilson, 1975; Wray, 2007). A number 
of excellent case studies have validated these early predictions 
and demonstrated that mutations or deletions affecting distal 
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regulatory elements called enhancers can alter ecologically rele- 
vant traits (Gompel et al., 2005; Shapiro et al., 2004; Attanasio 
et al., 2013). Recent successes in full-genome sequencing 
and epigenomic strategies have enabled the first genome-wide 
comparisons of transcription factor (TF) binding and regulatory 
landscapes in closely related species, demonstrating the value 
of comparative epigenomics in the context of high-genome or- 
thology for understanding principles of c/s-regulatory evolution 
(Bradley et al., 201 0; He et al., 201 1 ; Stefflova et al., 201 3). None- 
theless, despite the availability of human and chimpanzee ge- 
nomes, our knowledge of c/s-regulatory divergence between 
humans and our closest evolutionary relatives remains fairly 
speculative. Previous efforts have relied heavily on computa- 
tional approaches to pinpoint conserved non-coding elements 
that were either deleted or had undergone accelerated change 
specifically in the human lineage (McLean et al., 2011; Pollard 
et al., 2006; Prabhakar et al., 2006). Functional epigenomic com- 
parisons between humans and other primates have been largely 
limited to lymphoblastoid cell lines (Cain et al., 2011; Shibata 
et al., 2012; Zhou et al., 2014) or to profiling whole organs from 
more distantly related species (Cotney et al., 2013; Villar et al., 
2015). 

Recently, iPSCs were made available from our nearest living 
evolutionary relative, the chimpanzee (Marchetto et al., 2013), 
offering an opportunity to derive developmentally relevant and 
previously inaccessible tissue types in vitro. This allows aspects 
of species-specific development to be recapitulated in a dish, 
facilitating “cellular anthropology” through the discovery of 
cell-type-specific regulatory changes that occurred during re- 
cent human evolution. Here, we focus on the neural crest (NC), 
one of the embryonic cell populations most relevant to emer- 
gence of uniquely human traits. In vivo, NC cells (NCCs) arise 
during weeks ~3-5 of human gestation from the dorsal part of 
the neural tube ectoderm and migrate into the branchial arches 
and what will later become the embryonic face, consequently 
establishing the central plan of facial morphology (Bronner and 
LeDouarin, 2012; Cordero et al., 2011; Jheon and Schneider, 
2009). Within our recent evolutionary history, the modern human 
craniofacial complex has undergone dramatic changes in shape 
and sensory organ function, which helped to build a recognizably 
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human face and were required to accommodate the transition 
to bipedal posture, enlargement of the brain, extension of the lar- 
ynx for speech, and compensatory rotations of the orbits, olfac- 
tory bulb, and nasomaxillary complex (Bilsborough and Wood, 
1988; Lieberman, 1998; Spoor et al., 1994). 

To overcome the inability to obtain cranial NCCs (CNCCs) 
directly from higher primate embryos, we here employ a pluripo- 
tent stem-cell-based in vitro differentiation model in which spec- 
ification, migration, and maturation of human and chimpanzee 
CNCCs are recapitulated in the dish (Bajpai et al., 2010; Rada- 
Iglesias et al., 2012; this study). We compared TF and coactiva- 
tor binding, histone modifications, and chromatin accessibility 
genome-wide to annotate the divergent regulatory element 
repertoire of human and chimpanzee CNCCs. This information 
allowed us to explore, with unprecedented comprehensiveness 
and resolution, the mechanisms of tissue-specific enhancer 
landscape evolution within a developmentally relevant tissue 
type in humans and our nearest evolutionary relative. 

RESULTS 

Derivation of Human and Chimpanzee CNCCs 

Given the similarities in hominid gestational environment, we 
hypothesized that non-human primate CNCCs could be derived 
from pluripotent cells using the same cell culture conditions 
that we have previously applied to human embryonic stem 
cells (ESCs)/iPSCs (Bajpai et al., 2010; Rada-lglesias et al., 
2012). Chimp iPSCs have recently become available and can 
be maintained in vitro under identical conditions as human 
ESCs/iPSCs (Marchetto et al., 2013). Upon differentiation of 
our chimp iPSCs, we observed formation of highly mobile stel- 
late cells that were morphologically indistinguishable from hu- 
man CNCCs, expressed a broad range of migratory NC markers 
at levels equivalent to those seen in human cells, and had a very 
low level of HOX gene expression, a profile consistent with 
CNCC identity (Figures 1A-1C and SI A). To characterize stag- 
ing and homogeneity of our human and chimp CNCC popula- 
tions, we identified a panel of five cluster of differentiation 
(CD) markers, whose expression is sensitive to the develop- 
mental progression of CNCC (see Experimental Procedures 
and Figure SIB). These markers provided a platform for us to 
monitor and optimize our cell culture protocol for derivation 
and maintenance of primate CNCCs achieving metrics of homo- 
geneity greater than 90% regardless of the genetic background, 
initial cell source (e.g., iPSC versus ESC), or species (human 
versus chimp); see Figure SIC and Experimental Procedures. 
Cultured primate CNCCs show a high correlation of expression 
signatures and epigenomic profiles with CNCCs isolated from 
chick embryos, reinforcing the NC identity of these in-vitro- 
derived cells (Figures S2A and S2B). Importantly, derived hu- 
man and chimp CNCCs are both capable of prolonged mainte- 
nance (for up to 18 passages) and sustained differentiation 
capacity into both mesenchymal and non-mesenchymal line- 
ages (Figure S2C). Furthermore, xenotransplantation of cultured 
human and chimp CNCCs into the dorsal neural tube of early 
chick embryos demonstrates their ability to engraft and then 
follow endogenous migration cues into the distal branchial 
arches (Figures S2D and S2E). 



Epigenomic Profiling of Human and Chimpanzee CNCCs 

For epigenomic profiling, we derived CNCCs from H9 hESCs and 
from iPSCs from two humans and two chimpanzees (Marchetto 
et al., 201 3). We subsequently performed chromatin immunopre- 
cipitation and sequencing (ChIP-seq) using antibodies against 
CNCC TFs (TFAP2A and NR2F1), a general coactivator (p300), 
and histone modifications associated with active regulatory ele- 
ments (FI3K4me1 , FI3K4me3, and FI3K27ac) (Figures 1 A and 1 E). 
In parallel, we mapped genome-wide chromatin accessibility us- 
ing an assay for transposase-accessible chromatin (ATAC-seq) 
(Buenrostro et al., 2013). 

One crucial advantage of performing comparative epigenom- 
ics between human and chimpanzee, as opposed to a more 
distant primate relative, is the large similarity between genomes, 
which permits reciprocal mapping of sequencing reads to the 
reference genomes of both species. This allows for quantifica- 
tion of read enrichments from each species in the context of 
both reference genomes, removing otherwise difficult-to-con- 
trol-for biases due to mappability, ambiguous liftOver, and 
other technical caveats. Importantly, we could unambiguously 
assign one-to-one orthology between genomes for >95% of all 
enhancer candidates from either species, with the remaining 
4%-5% representing enhancers that fall within putative spe- 
cies-specific structural variants. We found that enrichments for 
all ChIP-ed factors and for chromatin accessibility were largely 
independent of the chosen reference genome and excluded all 
candidate elements for whom enrichment divergence was 
dependent upon the reference (< 0.1%) or that did not map 
uniquely in both genomes (see Experimental Procedures). Glob- 
ally, the observed epigenomic patterns at candidate regions 
were highly correlated for human and chimp CNCCs (Figures 
1 E and Figure S4A). 

Genome-wide Annotation of Human and Chimpanzee 
CNCC Regulatory Elements Uncovers Enhancers with 
Craniofacial Activity 

To annotate enhancers genome wide, we promiscuously identi- 
fied candidate c/'s-regulatory regions by the presence of TF 
or p300 enrichment and/or increased chromatin accessibility. 
We then restricted our analysis primarily to enhancers by assess- 
ing the ratio of H3K4me1/H3K4me3 enrichment at these candi- 
date sites, which distinguishes distal enhancers from promoters 
(Heintzman et al., 2007), and further using H3K27ac enrichment 
to differentiate active from inactive elements (Creyghton et al., 
2010; Rada-lglesias et al., 2011). The resulting enhancer candi- 
dates had enriched conservation signatures compared to sur- 
rounding genomic regions and were near genes annotated 
with craniofacial ontologies— consistent with bona fide NC 
enhancer status (Figures S3A-S3C). Furthermore, cross-refer- 
encing our enhancer list with the VISTA Enhancer Browser data- 
base (Visel et al., 2007) identified 247 regions overlapping CNCC 
enhancers that were functionally tested for activity in mouse em- 
bryos. Of those 247 regions: (1) 208 were active at El 1 .5 (odds 
ratio 6.33 and p < 5 x 1 0 -32 ), and (2) these 208 active enhancers 
were significantly enriched for activity in NC-derived head tis- 
sues (branchial arches and facial mesenchyme; Figure 1 D, ex- 
amples are shown in Figures IE [right], and Figure S3D). Thus, 
our analysis captures regulatory regions relevant for distinct 
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Figure 1. Derivation of Human and Chimpanzee CNCCs and Epigenomic Annotation of Craniofacial Enhancers 

(A) Workflow of comparative epigenomic strategy. 

(B) Confocal immunofluorescence detection of NC markers p75, TFAP2A, and NR2F1 in human and chimp CNCCs at passage 4. 

(C) RT-qPCR of NC markers, HOXs, and pluripotency markers OCT4 and NANOG in derived human and chimp CNCCs from two genetic backgrounds of each 
species. Error bars represent one SD. 

(D) Enrichment of annotated expression domain categories from overlap of top 1 5,000 enhancer calls with regions in the VISTA enhancer database, p values were 
calculated with Fisher’s exact test and corrected for pFDR. Categories with q value < 0.05 are indicated in red (enrichment) or blue (depletion). 

(E) Representative UCSC Genome Browser tracks showing ChIP-seq profiles for p300 (red), H3K27ac (green), H3K4me1 (blue), H3K4me3 (brown), and TFAP2A 
(orange) from both species aligned to hg19 reference genome. Representative elements tested through the VISTA enhancer database (Visel et al., 2007) dis- 
played on the right next to the reported lacZ expression domains. 



spatial identities in the developing face in vivo (Figure 1 D). Taken 
together, our epigenomic approach thus comprehensively anno- 
tates putative human and chimp NC enhancers, at least a subset 
of which is active in facial structures during embryogenesis. 

Quantitative Analysis of H3K27ac Enrichments Predicts 
Species-Biased Enhancers 

We hypothesized that, in closely related species, quantitative 
modulation of activity at orthologous regions is a major form 
of enhancer divergence. To identify such divergence, we used 
H3K27ac enrichment data in biological quadruplicate (i.e., inde- 
pendent CNCC derivations from each individual) to quantitatively 
approximate activity at all annotated CNCC enhancers detected 
for either species. Global comparisons of H3K27ac enrichments 
between individuals of the same species revealed high concor- 
dance of signals, with some minor variation due to either differ- 



ences in genetic background or experimental variability (Figures 
2A, highlighted in red, and S4A). Human and chimpanzee CNCC 
H3K27ac enrichment was also highly correlated when mapped 
to the same reference genome, and human and chimpanzee 
CNCC H3K27ac profiles clustered together distinctly from 49 
other human cell types (Figures S4A and S4B). Despite this 
high conservation of profiles, a substantial subset of elements 
demonstrated a significant species bias (Figure 2A, FDR < 0.01 
highlighted in blue), which we thereafter considered to be our 
species-biased enhancer candidates. H3K27ac ChIP-qPCR at 
select candidate enhancers from independent CNCC deriva- 
tions recapitulated this species bias (Figure S4C). 

Importantly, consistent with the premise that H3K27ac is a 
suitable readout of enhancer activity, the bias in H3K27ac status 
alone was highly predictive of biases in TF and p300 binding, as 
well as chromatin accessibility (Figure 2C; examples are shown 
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Figure 2. Identification of Species-Biased Enhancers Using H3K27ac Enrichments at Orthologous Loci 

(A) Enrichment of H3K27ac at candidate enhancer elements compared within individuals of the same species (red) or across species (blue/black), with overlay 
shown on the right. Enhancers with significant inter-species divergence indicated in blue (p a dj<0.01). 

(legend continued on next page) 
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in Figures 2D and S4D). Furthermore, this approach enabled 
genome-wide assignment of signed significance scores on a 
per-enhancer basis, visualizable as a genome browser track 
(Figure 2D, “Predicted Species Bias” track). 

Altogether, of all annotated active human CNCC enhancers 
(n = 14,606), 84% were invariant, 4% fell at non-orthologous 
sites, and 6% and 7% demonstrated quantitative increase or 
decrease, respectively (Figure 2B). One limitation is the low 
number of currently available chimpanzee iPSC lines, especially 
given the high reported degree of polymorphism among chimps 
(Kaessmann et al., 1999). To estimate false positive rate for 
identifying true fixed inter-species differences, we applied our 
strategy to previously published ChIP-seqs from chimp lympho- 
blastoid cell lines and estimated a conservative FDR of 0.15 
when using only two chimp genetic backgrounds. This suggests 
that the vast majority of identified differences represent function- 
ally fixed differences across species (the rest represent en- 
hancers that are still divergent but remain polymorphic within 
one of the species). Our observations agree with the emerging 
notion that quantitative modulation of enhancer activity is the 
prevalent source of regulatory landscape divergence among 
closely related species. 

c/s- Sequence Changes Drive Species-Biased Enhancer 
Activity In Vitro and In Vivo 

To functionally validate our predictions, we used a luciferase re- 
porter assay to examine activity of a selected set of orthologous 
pairs of species-biased human and chimpanzee enhancers. We 
found that >80% of tested enhancers had correlated species 
bias in luciferase expression, which was consistent regardless 
of whether the reporter assays were performed in human or 
chimpanzee CNCCs (Figures 3A and 3B). These results further 
validate that H3K27ac identifies both enhancer activity and 
bona fide species bias; thus, for simplicity, we refer to 
H3K27ac enrichment interchangeably with “activity.” Impor- 
tantly, these results also demonstrate that enhancer divergence 
can be largely explained by c/s- sequence changes rather than 
differences in the trans regulatory environments of the human 
and chimp CNCCs. 

The conservation of trans-e nvironments across species facili- 
tates testing of human and chimp regulatory elements in vivo us- 
ing a mouse LacZ transgenic reporter assay. We selected two 
predicted human-biased enhancers near CNTNAP2 (enhancer 
1) and PAPPA (enhancer 2), respectively (Figures 3C and 3D). 
For both predicted human-biased enhancers we observed gains 
of additional expression domains in head regions, as well as 
quantitative gains in enhancer strength, as evidenced by the 



overall higher LacZ staining intensity for the human sequence 
compared to the chimp ortholog (Figures 3C-3FI and S5). 
Notably, to ensure that the negative/weak staining results ob- 
tained with the chimp sequences were not a result of undersam- 
pling, we performed surplus embryo injections with both chimp 
enhancer reporters (Figure S5A). Thus, species-biased en- 
hancers identified in our in vitro analysis drive distinct expression 
patterns within CNCC-derived tissues in vivo. 

Human Accelerated Regions Overlap with Distal CNCC 
Enhancers 

Our results suggest that DNA sequence is the predominant 
driver of enhancer divergence; therefore, we began examining 
sequence properties of species-biased enhancers. Although 
species-biased enhancers were similar in H3K27ac enrich- 
ment levels when compared to invariant enhancers, they 
showed a distinct reduction of sequence conservation signa- 
tures (Figure 4A). Furthermore, we identified 163 “human 
accelerated regions” (HARs; Hubisz and Pollard, 2014) over- 
lapping active chromatin features in CNCCs, of which 20 
showed species-biased activity (at a cutoff of q < 0.001 ; n = 
48 with a cutoff of q < 0.1) (Figures 4B and S6A-S6D), repre- 
senting a significant enrichment relative to the whole enhancer 
set (p < 0.025, odds ratio 1.81). It is possible that the HAR- 
overlapping regions without species bias in CNCC could man- 
ifest divergence in another tissue type, as exemplified by 
HAR2 (a.k.a., FIACNS1), which overlaps an invariant CNCC 
enhancer (Figure S6D, p value of species bias = 0.339) that 
has a pharyngeal arch activity domain that is conserved in pri- 
mates but has human-specific activity in the embryonic limb 
(Prabhakar et al., 2008). 

Species-Biased Enhancers Are Enriched for Specific 
Classes of Retroelements 

Given that nearly half of the human genome is composed of 
transposable elements, the majority of which invaded the pri- 
mate lineage prior to the separation of humans and chimpanzees 
(Cordaux and Batzer, 2009), we hypothesized that a subset of 
species-biased orthologous enhancers may be transposon 
derived. Interestingly, we found that, while CNCC enhancers 
overlapped with many different classes of repeats, specific sub- 
classes of endogenous retroviruses (ERV1, ERVL-MaLR, and 
ERVK) as well as LI elements were preferentially enriched at 
species-biased enhancers (Figure 4C), suggesting that these 
specific subclasses may harbor progenitor sequences that are 
prone to acquire CNCC enhancer activity over relatively short 
evolutionary distances. 



(B) Pie charts showing the percentage of total active CNCC enhancers classified as either species-biased enhancers with gained activity (green), species-biased 
enhancers with decreased activity (purple), enhancers without clear orthology across genomes (yellow), or invariant enhancers (blue) using a human reference 
genome (above) or chimp reference genome (below). 

(C) Heatmap of raw ChIP-seq and ATAC-seq counts across species-biased and invariant CNCC enhancers for two human and two chimp genetic backgrounds. 
Each row represents a 2 kb window (1 kb each direction) centered around the middle of human-biased (n = 598, q < 0.0001 ), chimp-biased (n = 691 , q < 0.0001 ), or 
invariant (n = 584 representative subset, q > 0.95) enhancers for H3K27ac (green), p300 (red), TFAP2A (yellow), K4me1 (blue), and ATAC-seq (gray). All reads were 
aligned to hg19. 

(D) Representative browser tracks showing overlaid H3K4me1 (blue), p300 (red), and H3K27ac (green) from human and chimp CNCCs mapped to hg19. Ex- 
amples of strongly human-biased, weakly human-biased, or strongly chimp-biased enhancers highlighted in pink. Predicted species-bias track shown above for 
candidate enhancers; the magnitude of the bias track represents — logl 0 (adjusted p value of divergence) with negative sign (indigo) representing chimp bias and 
positive (bronze) human bias. 
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Figure 3. In Vitro and In Vivo Validations of Species-Biased Enhancers 

(A and B) Luciferase reporter assays performed in chimp CNCCs (A) or human CNCCs (B) for 9 chimp-biased regions (and orthologous human regions) and 8 human- 
biased regions (and orthologous chimp regions). Luciferase signal was normalized to renilla transfection control. Significance tested from three biological replicates 
from each species with ANOVA followed by residuals testing with Student’s t test. *p < 0.05, **p < 0.01 , ***p < 0.001 . Central bar represents the median, box outline 
represents first and third quartile, and whiskers extend to furthest datapoint within 1 .5x box length way from the box. Tested enhancers are named by nearest gene. 
(C and D) Genome browser tracks showing human-biased enhancer 1 (near CNTNAP2 gene; C) and enhancer 2 (near PAPPA gene; D) selected for a lacZ reporter 
mouse transgenesis assay. 

(E and F) Analysis of enhancer activity for chimpanzee and human enhancer 1 in a lacZ reporter transgenic mouse assay. (E) Representative El 1 .5 transgenic 
embryo obtained for the chimpanzee enhancer 1 reporter, shown in lateral view (left) or frontal view (right) of the embryonic head. (F) Representative El 1 .5 

(legend continued on next page) 
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Sequence Substitutions within TF Binding Motifs at 
Species-Biased Enhancers Contribute to Epigenomic 
Divergence 

Consistent with the expectation that species-specific biases 
are largely sequence driven, we observed that the variance in 
H3K27ac between species at each enhancer scales proportion- 
ally with the degree of sequence dissimilarity (i.e., Levenshtein 
distance) at those orthologous sites, while the intra-species 
variance at the same regions remains unchanged (Figure 4D). 
Nonetheless, even at enhancers with detectable species bias, 
sequence substitutions were still infrequent— only ~3-6 substi- 
tutions per 500 bp enhancer— suggesting that a small number 
of mutations can confer substantial effects on overall enhancer 
activity, likely by affecting binding of key sequence-dependent 
TFs. We therefore interrogated how frequently sequence substi- 
tutions fall within particular classes of TF motifs and to what de- 
gree these mutations correlate, either positively or negatively, 
with changes in enhancer activity or other chromatin modifica- 
tions (Figure 4E). This, in essence, leverages preexisting genetic 
variation like a large-scale mutagenesis screen. 

Through this approach, we identified a large set of both known 
and novel motifs for which deviation from the consensus was 
correlated with species bias of H3K27ac and other epigenomic 
marks, implying functional consequences for these mutations. 
As expected, the correlations vary in frequency and in effect, 
with some motifs being frequent and having small effects (e.g., 
Forkhead factors) and others being infrequent but conferring 
large effects (e.g., TFAP2A), with one outlier motif being both 
very frequent and conferring large effects when mutated (see 
description of the “Coordinator” motif below) (Figure 4F). Among 
our top hits, we identified many motifs for TFs with known effects 
in NC regulation, including a set of TFAP2 motif variants that 
serve as a positive control for our approach, as we see a high 
correlation between TFAP2 motif mutations and inter-species 
divergence in TFAP2A ChIP signals at these sites (Figure 4G, 
group 3). We previously showed that TFAP2A participates 
in establishment of active chromatin states at NC enhancers 
(Rada-lglesias et al., 2012), and consistently we observed that 
divergence from the TFAP2A consensus also correlates with 
the loss of H3K27ac, co-activator binding, and chromatin acces- 
sibility. Notably, TFAP2 motifs are depleted from species-biased 
sites, likely due to strong selective pressure to conserve TFAP2A 
function in the NC and possibly in other pleiotropic contexts (Fig- 
ure 4F). Another interesting set of motifs, which are both frequent 
at species-biased sites and positively correlated with permissive 
chromatin states, are those recognized by ALX homeobox fac- 
tors that are highly expressed in the face and mutated in severe 
frontonasal dysplasias in humans (Twigg et al., 2009) (Figures 4F 
and 4G, group 2). 



Intriguingly, we also identified a group of motifs whose muta- 
tions away from the consensus were correlated with a gain in 
chromatin accessibility and H3K27ac, suggesting that these 
motifs may recruit repressive factors with negative effects on 
overall enhancer activity. Examples of such motifs included 
the SNAI2 motif, which is bound by a known transcriptional 
repressor, the TBX family motif bound by T-box factors, and 
other candidate negative regulators representing distinct TF 
classes, e.g., HIC1/2, MESP1 , TCF3/4, and GLIS1 (Figure 4G, 
group 1). These results suggest an unappreciated prevalence 
of repressive inputs in quantitative modulation of enhancer 
activity. 

“Coordinator”: A novel Motif that Is Highly Predictive of 
Active Chromatin States and Species Bias 

Surprisingly, one motif stood out as an outlier in this analysis, as 
it was exceptionally enriched at divergent sites and was the most 
correlated with changes in all examined active chromatin fea- 
tures (Figures 4F, upper-right, and 4G, far-right). This sequence, 
which we termed the “Coordinator” motif, is a 1 7-bp-long motif, 
which we identified through de novo motif discovery from our 
CNCC-specific enhancers and was not previously annotated to 
a known regulatory complex. We note that portions of the Coor- 
dinator resemble an E box and HOX-like motifs; however, these 
represent large protein families, and the particular factors that 
bind at this element remain to be identified. 

Sequence analysis using INSIGHT, a tool to infer signatures 
of recent natural selection using human polymorphism data 
(Gronau et al., 2013), found evidence of positive selection at 
the Coordinator motif occurrences within species-biased en- 
hancers, but not within invariant enhancers, suggesting that 
the motif and its cognate binder(s) have played a privileged 
role in recent enhancer divergence in primate CNCCs (Figure 5A). 
When we further dissected the motif by individual bases, we 
found that the correlations of each nucleotide with ChIP enrich- 
ments (both for histone modifications and TF ChIPs) recapitu- 
lated the information content of the motif itself, as would be 
expected if Coordinator motif mutations were causal for the 
observed chromatin changes (Figure 5B). Fittingly, we found hu- 
man mutations that strengthen the Coordinator motif within both 
human-biased enhancers tested in mouse transgenesis (Fig- 
ure S6E). Globally, the Coordinator motif was preferentially en- 
riched at distal regulatory elements rather than at promoters 
(Figure S6F) and was further enriched at enhancers that were 
CNCC specific as opposed to those that shared measurable 
H3K27ac in other tissue types (Figure 5C). Interestingly, we 
observe that LTR9 elements, a retroelement class enriched at 
species-biased enhancers, are 5x more likely to harbor a Coor- 
dinator motif variant than MER52A elements, a similar repeat 



transgenic embryo obtained for the human enhancer 1 reporter, shown in lateral view (left) or frontal view (right). Midbrain/hindbrain junction (MHJ); periocular 
mesenchyme (POM); lateral and medial nasal processes (LNP and MNP); maxillary (Mx) and mandibular (Md) processes of branchial arch 1 (BA1) and BA2. Scale 
bars: 100 i^m (left images) and 50 (right images). 

(G and H) Analysis of enhancer activity for chimpanzee and human enhancer 2 in a lacZ reporter transgenic mouse assay. (G) Representative El 1 .5 transgenic 
embryo obtained for the chimpanzee enhancer 2 reporter, shown in lateral view (left) or frontal view (right) of the embryonic head. (H) Representative El 1 .5 
transgenic embryo obtained for the human enhancer 2 reporter, shown in lateral view (left) or frontal view (right). Midbrain (Mb); cranial nerves 8 and 10 (N8 and 
N10 respectively); sympathetic ganglia (SG); telencephalic midline groove (TMG); midbrain/hindbrain junction (MHJ); maxillary (Mx) and mandibular (Md) pro- 
cesses of branchial arch 1 (BA1) and BA2. Scale bars: 100 ^im (left images) and 50 i^m (right images). 
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Figure 4. Global Features of Species-Biased Enhancers and Correlation of Mutations within TF Binding Motifs with Epigenomic Divergence 

(A) Average PhastCons scores are shown for strong invariant enhancers (q > 0.98), strongly human-biased enhancers (q < 0.0001), or strongly chimp-biased 
enhancers (q < 0.0001) for 1 kb surrounding each enhancer center. 

(B) Degree of species bias (log 2 fold change H3K27ac human/chimp, y axis) relative to enhancer strength (human-chimp-averaged H3K27ac enrichment, x axis) 
for bulk CNCC elements (black) and elements overlapping HARs (color representing q value of species bias: q < 0.1 in red, q > 0.1 in green). 

(C) Counts of repeat families overlapping species-biased enhancers (y axis) relative to counts of repeat families overlapping all CNCC regulatory sites (x axis) are 
plotted, q values of enrichment for different repeat classes is indicated by color. 

(legend continued on next page) 
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Figure 5. Properties of the Novel “Coordinator” Motif 

(A) Expected number of adaptive substitutions (E[A]) per kilobase and expected number of deleterious mutations E[W] per kilobase were calculated for all sites of 
the Coordinator motif at invariant enhancers (green), at human-biased enhancers (red), and at chimp-biased enhancers (blue) using default INSIGHT parameters 
(Gronau et al., 2013). Significance indicated by * (p < 0.01). Overall fractions of nucleotides under selection (p) not shown (p in variant = 0.66, p < 0.01 ; phuman-biased = 
0.015, p < 0.01; Pchimp-biased = 0.019, p < 0.01). Error bars represent approximate SE. 

(B) Position weight matrix of the Coordinator consensus sequence from top 3,000 CNCC specific enhancers is shown (top) relative to logo of mutations preferred 
at more acetylated (H3K27ac) alleles (middle) versus mutations at less acetylated alleles (bottom). 

(C) Enhancers were scored for H3K27ac ChIP-seq enrichments from 30 public data set cell types and binned by number of tissues with activity (1 to 31). The 
fraction of enhancers per bin with recognizable Coordinator motif (p < 0.0001) is indicated on y axis. 

(D) Four different versions (VI -V4) of the Coordinator motif were cloned in tandem into luciferase reporter vectors and were tested for transactivation activity in 
human CNCCs. Luciferase was normalized relative to renilla transfection control. Error bars represent one SD. 

(E) Comparison of sequence changes within the Coordinator motif with a reconstructed human-chimp ancestral outgroup. Changes in fit to the Coordinator 
consensus compared to the ancestral ortholog (-log 10 p value) were plotted as orthographic projections along space diagonals for all occurrences of the motif for 
both human and chimpanzee lineages at different classes of sites. Overlapping data points were jittered for better visualization. Schematic is shown on the far left. 



class depleted from species-biased sites. Even at sites without 
activity in CNCCs, LTR9 sequences are 3.7 x more likely to har- 
bor a Coordinator-like motif than MER52A, consistent with the 
idea that a preexisting Coordinator-like progenitor sequence 



contributed to the recent adaptation of some retroelements for 
CNCC enhancer function. Lastly, we found that the Coordinator 
motif alone was able to drive activity in luciferase reporter assays 
in CNCCs (Figure 5D). 



(D) Pairwise H3K27ac variance a2-a2| d = 0 at enhancers across samples, ranked by increasing sequence dissimilarity counted by Levenshtein distance (Id) 
between human (hg19) and chimp (panTro3) orthologous 200 bp enhancers, relative to Id = 0. Comparison between samples of different species shown in black; 
same species shown in red (means represented by thick lines). 

(E) Schematic showing method for deriving the correlation coefficient. For a given motif, each occurrence genome wide containing a genetic change across 
species is plotted as A-logl 0 p value (human/chimp) of the fit to consensus (x axis) versus AH3K27ac for the overlying enhancer region (human/chimp) (y axis), 
then a line is fit. The slope of the line represents the correlation coefficient for that given motif and epigenomic modification genome wide. 

(F) Enrichments of classes of motifs at species-biased enhancers over all enhancers (log odds ratio, x axis) plotted relative to genome-wide correlation coefficient 
calculated for each motif (using H3K27ac), as described in E (y axis). 

(G) Genome-wide correlation coefficients were calculated for whole databases of annotated motifs and multiple chromatin features, revealing motifs with large 
influence on epigenomic profiles. Correlation coefficients are bi-clustered per motif, and resulting changes in enrichment of chromatin features (p300, K27ac, 
TFAP2A, H3K4me1 , H3K4me3, NR2F1 , ATAC) at all enhancers containing mutated PWMs are represented by color. Individual subclusters are magnified below 
with corresponding motifs indicated. 
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Figure 6. Clusters of Regulatory Divergence Overlap Loci with Crucial Roles in Trait Variation and Are Predictive of Expression Bias 

(A) Mean normalized human expression (x axis) versus mean normalized chimp expression (y axis) for genes associated with human-biased enhancers (q < 0.001 , 
blue) or with chimp-biased enhancers (q < 0.001 , red). Only genes with significant inter-species expression change (p ad j value < 0.1) are shown. 

(B) Violin plots showing log 2 fold change human/chimp H3K27ac enrichment at orthologous enhancers binned by total count of biased enhancers (total number of 
human-biased enhancers minus total number of chimp-biased enhancers) within 250 kb of promoter regions for genes with significant differences in expression 
across species (p ad j value < 0.1). 

(legend continued on next page) 
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Sequence Analysis Reveals the Recent Evolutionary 
History of Coordinator Motif Changes 

Our results suggest that nucleotide changes within Coordinator 
motif sites represent an important class of “causative” muta- 
tions predictably associated with gain or loss of CNCC enhancer 
activity. Thus, by comparing the fit to the consensus for Coordi- 
nator-like motifs with a reconstructed ancestral outgroup, we 
can infer the polarity of enhancer activity change in each lineage 
relative to the common human-chimp ancestor. Using this strat- 
egy, we observed that human-biased enhancers contain Coordi- 
nator-like sequences that were equally prone to: (1) a gain in the 
fit in the human lineage (n = 300) or (2) a loss in the fit in the chimp 
lineage (n = 255) relative to the ancestral state (Figure 5E). How- 
ever, human-biased enhancers contain almost no examples in 
which there was a gain of Coordinator fit in the chimp lineage 
or loss in the human lineage, an important validation of our anal- 
ysis. Conversely, we see that chimp-biased enhancers are simi- 
larly prone to gains of the Coordinator motif in the chimp lineage 
(n = 21 8) versus losses in the human lineage (n = 255) and again, 
with almost no gains in human or losses in chimp. Thus, there ap- 
pears to be no preferred direction of enhancer divergence in 
either lineage since the split from our common ancestor for this 
class of sites. We also applied our analysis to hominin outgroups 
such as Denisovans and Neanderthals and found that, as ex- 
pected given the much more recent split from the common 
ancestor, these lineages primarily share the human-like variants 
of the Coordinator motif at species-biased sites (Figure S6G). 
Therefore, even for individuals substantially more diverged 
than any modern human, most changes are present in the hom- 
inin lineage relative to the human-chimp ancestor. However, 
there is a small set of changes that are unique to modern humans 
compared to other hominins, and those clearly merit further 
exploration. 

Species-Biased Enhancers Flank Genes that Show 
Species-Biased Expression 

Recent studies suggest that gene expression levels are more 
evolutionarily conserved than utilization of c/'s-regulatory ele- 
ments and can be buffered by redundant or compensatory ele- 
ments regulating the same loci (Hong et al., 2008; Odom et al., 
2007; Schmidt et al., 2010; Vierstra et al., 2014; Wong et al., 
2014). Nonetheless, at least some of the species-biased en- 
hancers should be associated with transcriptional changes at 
nearby genes if they are responsible for morphological variation. 
To test this, we performed RNA-seq analyses of our human and 
chimp CNCC populations and identified genes whose expres- 
sion significantly diverged between, but not within, species. 
We found that genes with significantly divergent expression 
between humans and chimpanzees are strongly enriched for 



nearby species-biased enhancers, with human-biased genes 
flanked by human-biased enhancers and chimp-biased genes 
flanked by chimp-biased enhancers (Figure 6A). In addition, we 
observed that the fraction of species-biased genes (but not the 
degree of the expression bias) scales with the number of flanking 
enhancers biased toward the same species (Figure 6B). 

Clusters of Regulatory Divergence Flank Loci Involved in 
Intra-Human Facial Variation 

Interestingly, we found that strongly divergent enhancers were 
not distributed at random throughout the genome but instead 
were likely to fall in close genomic proximity to other species- 
biased enhancers matching in polarity (Figure S7A), suggesting 
that divergent enhancers fall into regulatory clusters. To system- 
atically locate these clusters, we calculated a genome-wide 
divergence score using a moving window over the nearest ~10 
enhancers for each species, integrating both the degree and 
genomic span of divergent enhancers in series (Figure S7B). 
This strategy revealed a low baseline encompassing the bulk 
of interspersed species-biased enhancers (examples of Chrl 1 
in Figures S7C and S7D, top) but exposed a subset of regions 
throughout the genome (~1-4 per chromosome), with a marked 
increase in their divergence score resulting from presence of 
dense clusters of strongly biased enhancers (Figure 6C). Impor- 
tantly, we find that these clusters of divergence do not emerge 
simply by chance due to increased frequency of enhancers 
near highly active CNCC genes (Figures S7C and S7D). 

When ranking all human- and chimp-biased enhancers ac- 
cording to their divergence score, we observed an inflection in 
the distribution (Figures 6D for human, 6E for chimp). Using 
this inflection point as a cutoff, we identified 32 human and 65 
chimp clusters of divergence, spanning genomic windows of, 
on average, ~500 kb and encompassing ~11.9% of all spe- 
cies-biased enhancers. Of note, while some clusters overlapped 
super-enhancers in CNCCs, most super-enhancers were not 
identified as a species-biased cluster and many species-biased 
clusters did not encompass super-enhancers, indicating that 
these two entities are distinct (Whyte et al., 2013). 

We speculate that these species-biased enhancer clusters 
represent broad c/'s-regulatory regions under strong evolutionary 
pressure to diverge and hypothesize that they may contain genes 
with central roles in the regulation of NC-associated phenotypes. 
Indeed, these regions fall immediately over or next to genes that 
are critical in facial morphogenesis, including PRDM16, MN1, 
COL 17A1 , EDNRA , PAX3, PAX7, SOX10, and ALX4. Intriguingly, 
of five chromosomal regions linked to normal-range human facial 
variation in GWAS, three (PRDM16, COL17A1, and PAX3) fall 
directly within these regions of high divergence. Importantly, 
the clusters were highly predictive of changes in nearby gene 



(C) Representative browser tracks showing clusters of species-biased enhancers. Top panel shows broad view with predicted species-bias track (human-biased 
in orange, chimp-biased in blue) and the corresponding H3K4me1 (blue), p300 (red), and H3K27ac (green) from two individuals of each species shown in overlay. 
Boundaries of the cluster are indicated by a red block. Close-up of an individual cluster of biased enhancers is shown below. All chromatin features are mapped to 
hg19. 

(D and E) Distribution of divergence scores at human-biased enhancers (D) and chimp-biased enhancers (E). Selected genes falling within identified clusters are 
highlighted next to the enhancer in the cluster with highest divergence score. 

(F) Mean normalized human expression (x axis) versus mean normalized chimp expression (y axis) for genes within or flanking human-biased enhancer clusters 
(blue) or chimp-biased enhancer clusters (red). 
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(see Figure S4B); 



Heterozygous loss-of-function mutations cause Waardenburg syndrome, 
characterized by craniofacial, auditory and pigmentation defects; in model 
organisms Pax3 is involved in induction, specification and differentiation of 
neural crest cells and craniofacial development; in GWAS studies the locus was 
associated with normal-range variation of facial morphology in Europeans. 



Asher etal., 1996; 
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Involved in early specification of the neural crest in the embryo; loss of function in 
the mouse leads to reduction of the maxilla and a pointed snout. 
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Mouse deficient for Ednra exhibit cranial and cardiac neural crest defects. Most 
lower jaw structures in Ednra _/_ embryos undergo a homeotic transformation into 
maxillary-like structures; other defects include absence of tympanic rings, malleus, 
and incus, and the rostral relocation of the hyoid bone. 
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chr20:58028993-580291 92 ; chr20:58154612- 
58154811 (see Figure S4B); 


Heterozygous or homozygous mutations associated with several human 
neurocristopathies, including Waardenburg syndrome type 4B, Hirschprung's 
disease, and Congenital Central Hypoventilation Syndrome (CCHS); in animal 
models EDN3 is involved in the regulation of coat pigmentation and enteric neuron 
function. 
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Involved in guidance of NCC migration and restricting migratory paths of cranial and Gammill et al., 2007; 

trunk NCCs, positioning sensory neurons and organizing their projections. Schwarz et al., 2008; 

Yu and Moens, 2005; 
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Ephrin B signaling is involved in targeting and restricting neural crest migration Orioli et al., 1996; 

within branchial arches; compound EphB2/B3 knockout in mice leads to cleft palate. Risley etal., 2009; 

Smith etal., 1997; 
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Proposed to be a crucial mediator of beak shape changes in Darwin’s finches and of 
craniofacial shape and morphological adaptive radiation in Cichlid fish. CNCC- 
specific overexpression of BMP4 during mouse development results in the dramatic 
change of facial shape, with shortening in both the mandible and maxilla, rounding 
of the skull shape, and more anterior orientation of the eyes. In humans, 
mutations/variants in BMP4 are associated with orofacial clefts, microphthalmia, 
and age of the primary tooth eruption. 
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Wu et al., 2004; 



2.2758E-05 chr7:33525807-33526006 ; chr7:33526320- 

33526519 ; chr7:33527256-33527455 ; 
chr7:3354031 9-3354051 8 ; chr7:33540928- 
33541127; (see Figure S4B) 



Negative regulator of BMP4 function in osteoblast and chondrocyte differentiation 
(see also BMP4). In humans, homozygous or heterozygous mutations in 
BMPER are associated with a skeletal disorder, diaphanospondylodysostosis, 
whose consistent craniofacial features include ocular hypertelorism, epicanthal 
folds, depressed nasal bridge with short nose, and low-set ears. 



Binnerts et al., 2004; 
Funari et al., 2010; 
Moser et al., 2003; 



1.2564E-11 chr4:1 1 1 230391 -111 230590; chr4:1 11230942- PITX2 haploinsufficiency is associated with Axenfeld-Rieger syndrome involving 
111231141 ; chr4:1 11 820988-1 11 821 187 ocular anterior segment dysgenesis, tooth anomalies, and craniofacial anomalies 

such as maxillary hypoplasia with mid-face flattening and prominent forehead; in 
mice, ocular manifestations are largely recapitulated by the neural crest-specific 
knockout of Pitx2; genetically interacts with FOXC1, see also FOXC1. 



Evans and Gage, 2005; 
Lu et al., 1999; 

Matt et al., 2005; 
Semina et al., 1996 



3.4267E-02 chr6:1 744897-1 745096 



2.1354E-02 



chr2:1 05024721 -1 05024920 
chr2 : 1 04990082-1 04990281 
chr2 : 1 04989534-1 04989733 
chr2 : 1 04937657-1 04937856 



Heterozygous mutations in FOXC1 are associated with Axenfeld-Rieger syndrome 
(A-RS) involving ocular anterior segment dysgenesis, tooth anomalies, and 
maxillary and mandibular hypoplasia; dosage-dependent interactions with another 
A-RS gene PITX2 have been observed; in mice, loss of FoxCI results in bony 
syngnathia, defects in maxillary and mandibular structures, and agenesis of the 
temporomandibular joint; see also PITX2 . 

In mouse knockout leads to loss of squama temporalis and stapes fusion to styloid 
process. 



Berry et al., 2006; 

Inman etal., 2013; 

Kelberman et al., 2011; 

Mears et al., 1998; 

Turner and Bach-Holm, 2009; 



Dheedene et al., 2014; 
Jeong et al., 2008; 



(legend on next page) 
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expression for the bulk of the associated genes in the region (Fig- 
ure 6E), suggesting that either (1) multiple genes in the vicinity 
must be under coordinated selection for these super-divergent 
regions to emerge or, more likely, that (2) strong selection on 
one or a few target genes could drive changes in the local 
enhancer landscape that have secondary effects on other genes 
in the vicinity. Altogether, we provide evidence that highly diver- 
gent clusters of tissue-specific enhancers may promote inter- 
species and intra-species phenotypic variation. 

Resource for Studies of Human Morphological Evolution 

In addition to informing the basic mechanisms underlying the c/'s- 
regulatory divergence of human and chimpanzee NC, our study 
also provides a rich resource for future investigations of morpho- 
logical evolution of human craniofacial traits. Ontology annota- 
tions of all significantly species-biased enhancers reveal strong 
associations with processes important for various craniofacial 
structures that are diverging in human and chimps (Figure 7A). 
As examples, we highlight some of the most interesting diver- 
gent candidate genes in Figure 7B. These featured loci show 
species-biased expression in our RNA-seq and also map to re- 
gions with species-biased enhancer divergence and are empha- 
sized due to their known associations with CNCC development 
and/or facial morphology. Nonetheless, it is important to bear 
in mind that the biases in gene expression and enhancer states 
highlighted in Figure 7 refer to the relative change between hu- 
man and chimpanzee CNCCs, without ascribing the polarity of 
the change with respect to the ancestral status. 

Our divergently expressed genes are known to be involved 
in multiple, distinct developmental processes that cooperate 
to influence differential allocation of CNCCs in facial primordia 
and, in turn, contribute to species-specific morphology (Fish 
et al., 2014). These processes (and associated species-biased 
genes) include: (1) CNCC specification (e.g., PAX3, PAX7), (2) 
migration and guidance of CNCC migratory paths (e.g., 
EPHB2, NRP2, EDNRA, EDN3), (3) modulation of CNCC prolifer- 
ation at facial primordia (e.g., BMP4), and (4) regulation of CNCC 
differentiation (e.g., PITX2). Moreover, heterozygous mutations 
in many of these genes (e.g., PAX3, PITX2, FOXC1 , EDN3, 
BMPER) are associated with human syndromes that include 
craniofacial manifestations, suggesting that altered gene dosage 
can drive both morphological variation between species and, 
below a certain threshold, disease-associated malformations 
(Figure 7B). Furthermore, many phenotypes of the highlighted 
genes affect aspects of head morphology that have diverged 
between humans and chimps (e.g., size of the mandible and 
maxilla, skull shape, and pigmentation) (Figure 7B and Discus- 
sion). Altogether, our study provides a wealth of candidate loci 
for further deep exploration in studies of human evolution and 
variation. 



DISCUSSION 

Our study utilizes primate cellular models to provide a compre- 
hensive map of human and chimp regulatory divergence in a 
tissue with central relevance to the development of the head 
and face. We show that a common mechanism of regulatory 
divergence in higher primates is quantitative modulation of 
orthologous elements, driven largely through small numbers 
of sequence changes that perturb tissue-specific TF binding 
motifs. This is consistent with previous studies from closely 
related Drosophila or mouse species demonstrating that large 
effects can be conferred by a small number of mutations 
affecting direct and cooperative binding of key TFs (Bradley 
et al., 2010; He et al., 201 1 ; Stefflova et al., 2013). Interestingly, 
we find that not all TF binding sites contribute equally to regu- 
latory divergence— in fact, we identify a broad spectrum of 
regulatory motifs that vary in frequency and effect, suggesting 
a mechanism through which evolution can fine-tune cis- regula- 
tion across an enhancer landscape. One outlier in our analysis 
is the Coordinator motif, a de novo consensus sequence that 
is strongly predictive of the surrounding chromatin features 
and is highly enriched at species-biased enhancers. We spec- 
ulate that the factor(s) that recognize the Coordinator motif 
play a privileged role in the establishment of enhancer compe- 
tence in this cell context, reminiscent of the Drosophila TAG- 
team motif bound by a pioneer factor Zelda (Liang et al., 
2008; Satija and Bradley, 2012). Furthermore, we find evidence 
of repressive inputs into quantitative modulation of enhancer 
activity, with a sizable number of motifs whose gain in strength 
negatively correlates with acquisition of permissive chromatin 
states. 

Our work provides a rich framework for future gene-centric 
studies on the developmental mechanisms of human morpho- 
logical evolution. Indeed, our approach identified loci that are 
known to profoundly affect NC development and craniofacial 
morphology, often in a dosage-sensitive manner. For example, 
we observed that two genes involved in CNCC specification, 
PAX3 and PAX7, are expressed at higher levels in chimps and 
are associated with clusters of chimp-biased enhancers. In 
mice, mutations of these TFs lead to reduction of pigmentation 
and snout length (Pax3) (Tremblay et al., 1995) and reduction 
of maxilla and pointed snout (Pax7) (Mansouri et al., 1996), 
features that are consistent with smaller jaw size and hypopig- 
mentation of humans as compared to chimps. Furthermore, 
humans are sensitive to alterations of PAX3 dosage, as haploin- 
sufficiency of this gene is associated with craniofacial, auditory, 
and pigmentation defects (Waardenburg syndrome, OMIM 
#193510), and genetic variants at this locus have been identified 
in GWAS studies as regulators of normal-range facial shape 
(Liu et al., 2012; Paternoster et al., 2012). Thus, variation in 



Figure 7. Species-Biased Enhancers Are Associated with Genes Affecting Craniofacial Structures 

(A) GREAT term enrichments and associated facial regions indicated for human-biased enhancers (q < 0.01 , baseMean > 300) and chimp-biased enhancers (q < 
0.01 , baseMean > 300); binomial raw p values are shown below. Ontology categories are color coded (human phenotypes, red; mouse phenotypes, blue; 
biological processes, green). 

(B) Table of highlighted divergently expressed genes showing direction of bias (human-biased versus chimp-biased indicated by H or C, respectively), DESeq 
adjusted p value of expression divergence, coordinates of nearby species-biased enhancers with corresponding bias (hgl 9), description of genetic phenotypes, 
disease associations, comments, and relevant references. Full reference information can be found in Table SI. 
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PAX3 and PAX7 levels represents an attractive possible mecha- 
nism for mediating facial shape divergence between humans 
and chimpanzees. 

We also find evidence that genes already known to affect facial 
morphology in other species, such as BMP4, are diverging in 
higher primates as well. BMP4 is the most well-understood 
example of a factor that influenced craniofacial morphological 
change during evolution, as it has been implicated in mediating 
changes in beak morphology in Darwin’s finches (Abzhanov 
et al., 2004) and in jaw shape in Cichlid fish (Albertson et al., 
2005). We were therefore intrigued to note that BMP4 is associ- 
ated with strongly human-biased enhancers and is expressed 
at higher levels in humans than in chimps. Conversely, expression 
of the BMP4 inhibitor B/V/PER was significantly chimp biased and 
showed dramatic strengthening of the local chimp enhancer 
landscape. What would be the potential effects of elevated 
BMP4 expression on primate facial development? Interestingly, 
in the mouse, CNCC-specific overexpression of BMP4 results 
in a dramatic change of facial shape, with shortening of both 
the mandible and maxilla, rounding of the skull, and more anterior 
orientation of the eyes (Bonilla-Claudio et al., 201 2)— morpholog- 
ical changes that resemble those observed between human and 
chimps. Thus, the same molecular mechanism that has been 
postulated to influence beak morphology in Darwin’s finches 
may also contribute to our uniquely human facial features. 

Even more intriguing, of five chromosomal regions that have 
been associated with normal-range human facial variation in 
GWAS, three (PRDM16, COL17A1, and PAX3) coincide with clus- 
ters of species-biased enhancers uncovered in our study (Liu et al. , 
2012; Paternoster et al., 2012), suggesting a significant overlap 
between loci regulating intra- and inter-species variation of facial 
shape in higher primates. We therefore hypothesize that other 
divergent clusters identified in our study represent novel candi- 
dates for loci involved in the regulation of facial shape in humans. 
More broadly, we suggest that comparisons of human regulatory 
landscapes with those of a closely related primate in any tissue of 
interest may provide an effective strategy to identify candidate loci 
involved in normal-range and disease-associated variation. 

EXPERIMENTAL PROCEDURES 
CNCC Derivation 

Pluripotent lines were differentiated into CNCC as previously described (Rada- 
Iglesias et al., 2012). Details are provided in the Supplemental Experimental 
Procedures. 

Chromatin Immunoprecipitation and Preparation of ChIP-Seq 
Libraries 

Chromatin immunoprecipitation (ChIP) was performed using ^0.5-1 x 10 7 
CNCCs per experiment, as previously described (Bajpai et al., 2010; Rada- 
Iglesias et al., 2011, 2012). Antibodies used for ChIPs are listed in the 
Supplemental Experimental Procedures. Sequencing libraries were prepared 
starting from 30 ng of ChIP DNA using the NEBNext Multiplex Oligos for 
lllumina kit (Cat# E7335S). Libraries were multiplexed four to six samples per 
lane for 1 x 50 bp next-gen sequencing on lllumina HiSeq platform. 

Quantitative Analysis of H3K27ac ChIP-Seq and Identification of 
Divergence 

All sequencing reads were aligned to both reference genomes (hg19 and pan- 
Tro3) using default settings with bowtie2.2.4, regardless of species of origin. 



Modal peak positions for candidate regulatory elements were determined us- 
ing a mean shift procedure, described in the Supplemental Experimental Pro- 
cedures. To obtain count statistics for each H3K27ac ChIP alignment, we 
counted read coverage in a 1 .6 kb window surrounding modal peak positions. 
ENCODE-blacklisted regions and outlier regions with high counts in control 
input sequences relative to ChIP were removed as artifacts. Scores for visual- 
ization and classification of remaining ChIPs were obtained using a kernel den- 
sity estimate, as previously described (Buecker et al., 2014). 

Calculations of species bias were inferred with DESeq2, based on the 
read counts from all replicates of H3K27ac at candidate enhancers from 
three human lines (one hESC, two iPSC) and two chimp lines (two iPSC). 
DESeq2 analysis was performed separately for panTro3 and hg19 counts; 
then conservatively, the higher p-adj value and lower abs(log2FoldChange) 
of the analysis from either hgl 9 or panT ro3 were assigned to each region, while 
rare regions with discordant calls were excluded from list of biased sites (less 
than 0.1%). 

ACCESSION NUMBERS 

All sequencing data sets were deposited in the NCBI GEO repository under 
accession ID GEO: GSE70751. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and three tables and can be found with this article online at 

http://dx.doi.Org/10.1016/j.cell.2015.08.036. 
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SUMMARY 

Leptin is a hormone produced by the adipose tissue 
that acts in the brain, stimulating white fat break- 
down. We find that the lipolytic effect of leptin is 
mediated through the action of sympathetic nerve 
fibers that innervate the adipose tissue. Using 
intravital two-photon microscopy, we observe that 
sympathetic nerve fibers establish neuro-adipose 
junctions, directly “enveloping” adipocytes. Local 
optogenetic stimulation of sympathetic inputs in- 
duces a local lipolytic response and depletion of 
white adipose mass. Conversely, genetic ablation 
of sympathetic inputs onto fat pads blocks leptin- 
stimulated phosphorylation of hormone-sensitive 
lipase and consequent lipolysis, as do knockouts of 
dopamine p-hydroxylase, an enzyme required for 
catecholamine synthesis. Thus, neuro-adipose junc- 
tions are necessary and sufficient for the induction of 
lipolysis in white adipose tissue and are an efferent 
effector of leptin action. Direct activation of sympa- 
thetic inputs to adipose tissues may represent an 
alternative approach to induce fat loss, circumvent- 
ing central leptin resistance. 

INTRODUCTION 

White adipose tissues (WATs) serve as a storage depot for en- 
ergy-rich triglycerides. In times of privation, this lipid storage 
can be released as part of an adaptive response to the energy 
shortage. Lipolysis, the process of hydrolyzing stored triglycer- 
ides in adipocytes, is regulated by several G-protein-coupled 
receptors, including adrenergic receptors, all of which activate 
protein kinase A (PKA) and elevate the intracellular levels of cy- 
clic adenosine monophosphate (cAMP) (Brasaemle, 2007). 
PKA also phosphorylates several key target proteins, including 
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lipid-droplet-associated protein perilipin, hormone-sensitive 
lipase (HSL), and a set of esterases that collectively promote 
the hydrolysis of triglycerides into free fatty acids (FFAs) and 
glycerol, which are then released into plasma to meet the energy 
demands of other tissues (Brasaemle, 2007). HSL is a canonical 
target of PKA in adipocytes, and this enzyme catalyzes the con- 
version of diacylglycerol to monoacyl glycerol (Brasaemle, 
2007). 

Adipose tissue mass is homeostatically controlled by an endo- 
crine loop in which leptin acts on neural circuits in the hypothal- 
amus and elsewhere in brain to regulate food intake and periph- 
eral metabolism (Friedman and Halaas, 1998). In wild-type (WT) 
and leptin-deficient ob animals, leptin treatment reduces food 
intake and leads to a rapid depletion of fat mass (Halaas et al., 
1995, 1997; Montez et al., 2005). Of note, the depletion of fat 
mass after leptin treatment is distinct from that observed after 
food restriction in a number of respects: leptin treatment spares 
lean body mass and also potently stimulates glucose meta- 
bolism, while starvation results in a loss of lean body mass and 
causes insulin resistance (Newman and Brodows, 1983; Koffler 
and Kisch, 1996; Awad et al., 2009; Elia et al., 1999). In addition, 
leptin-deficient ob/ob mice pair-fed to leptin-treated ob mice 
lose only half the weight of those treated with leptin, further impli- 
cating a mechanism beyond a reduced food intake (Rafael and 
Herling, 2000). Because leptin has been shown to increase the 
sympathetic efferent signal to brown adipose tissues (BAT) 
(Scarpace and Matheny, 1998; Rezai-Zadeh and Munzberg, 
2013), it has been suggested that leptin also activates sympa- 
thetic efferents to WAT to increase lipolysis in WAT. However, 
this has not been directly shown, and the nature of the effector 
mechanism underlying leptin-stimulated lipolysis in WAT has 
not been defined. In particular, it has not been established 
whether the increased lipolysis in WAT in response to leptin is 
due to a circulating hormone (or hormones) such as norepineph- 
rine (NE) and/or another mediator that is released either centrally 
or peripherally (adrenal gland or macrophages), or specific 
efferent neural inputs to WAT, which mediates central leptin ac- 
tion. However, the effect of leptin on energy balance does not 
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Figure 1. Leptin Stimulates HSL Phosphor- 
ylation in WAT 

(A) Immunostaining of p-HSL (red) in paraffin 
sections of epididymal fat of C57BI6/J mice that 
were peripherally administrated with 500 ng/hr 
recombinant leptin for 2 days. 

(B) p-HSL and phosphorylated PKA substrates in 
total protein extracts of epididymal fats were 
examined by immunoblot analysis. 



tion of sympathetic inputs to adipose tis- 
sues as a strategy for the induction of fat 
loss. 



RESULTS 



require the presence of intact adrenals, suggesting that this 
organ is unlikely to be the source of the lipolytic signal (Arvaniti 

et al., 1998). 

While numerous previous studies have shown dense neural 
innervation of BAT, both functionally and anatomically, the inner- 
vation of WAT has been difficult to visualize. Thus, it has been 
suggested that neural inputs to WAT are either very sparse or 
difficult to be distinguished from en passant axons with terminals 
on other cell types, such as those in vasculature (Bartness et al., 
2005; Bartness and Song, 2007; Youngstrom and Bartness, 
1995; Giordano et al., 1996). Indeed, some reports have sug- 
gested that the only innervation of WAT is perivascular and 
that white adipocytes themselves are not directly innervated 
(Giordano et al., 2005). This controversy has heightened the 
uncertainty as to the relative roles of sympathetic neural activity 
to regulate WAT metabolism. Alternatively, macrophages in ad- 
ipose tissue account for about 1 0% of the stromal vascular frac- 
tion (SVF); hence, local catecholamines produced by these cells 
could also contribute to lipolysis in WAT in vivo (Weisberg et al., 
2003; Nguyen et al., 2011). Thus, the dramatic decrease of adi- 
pose tissue mass observed after leptin treatment could, in prin- 
ciple, be mediated by catecholamines or other mediators that 
are either locally produced or produced by neurons. 

In this study, we use anatomic, optogenetic, biochemical, and 
genetic approaches to show that the catecholamines released at 
heretofore-unidentified neuro-adipose junctions mediates the 
lipolytic effect of leptin, thus establishing the effector mechanism 
underlying the depletion of fat mass by leptin and, potentially, 
other stimuli. Our data demonstrate that the local sympathetic 
activity in WAT is necessary and sufficient for the lipolytic effect 
of leptin. In addition, genetic evidence shows that the J3-adren- 
ergic, but not a-adrenergic, receptors partially constitute a 
signaling pathway that accounts for the lipolytic effect of leptin. 
Moreover, the effect of pre-synaptic manipulations, such as neu- 
ral gain of function or loss of function, is more profound than that 
of post-synaptic manipulations, thus suggesting direct activa- 



Phosphorylation of HSL in WAT as a 
Actin Lipolysis Marker for Leptin Action 

To directly assess the cellular effect of lep- 
tin on lipolysis in white adipocytes and 
provide a marker for leptin action, we 
searched for biochemical responses in white adipocytes that 
were specifically activated by leptin treatment. We used a battery 
of phospho-specific antibodies and found that the phosphoryla- 
tion of HSL was robustly increased in adipose tissue in response 
to leptin treatment. Note that our ability to define a biochemical ef- 
fect of leptin is dependent on the quality of the antibodies, and we 
found that the anti-pHSL antibody was extremely robust. As 
shown, peripheral administration of leptin led to a significant in- 
crease of phosphorylated HSL (p-HSL) in WAT that could be visu- 
alized by immunohistochemistry (Figure 1A) and quantified by 
immunoblot analysis (Figure IB). We set out to investigate 
whether the effect of leptin to increase HSL phosphorylation 
was mediated by neural efferent outputs onto WAT. 

Axonal Bundles Project to WAT and Form Sympathetic 
Neuro-adipose Junctions 

We first used tomography methods to determine whether fat 
pads were innervated. By coupling optical projection tomogra- 
phy (OPT) to a fat-clearing method that renders whole organs 
transparent, we were able to macroscopically visualize and 
document the nerve bundles that innervate the inguinal fat pad 
(Figure 2A; Experimental Procedures; Supplemental Experi- 
mental Procedures for details) (Gualda et al., 2013; Quintana 
and Sharpe, 2011). A full series of projections of the whole 
organ are acquired from multiple angles, typically 800-1,600 
angles, and from this series of projections, a stack of axial 
slices can be visualized through back-projection reconstruction 
(Figure 2B). 

From an OPT series of coronal optical sections of inguinal fat 
organ, we performed a 3D reconstruction, which enabled the 
visualization of thick axon bundles targeting the fat pad (Figures 
2C and 2D). Axon bundles can be identified based on the gray 
threshold level and morphological features that distinguish 
them from the vasculature (Figure 2E). These structures within 
the fat were then segmented using semi-automated software 
(see Experimental Procedures). 
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Figure 2. Neural Projections in Fat Detected with OPT 

(A) Schematic representation of the OPT method applied to the subcutaneous inguinal fat. (i) Tissue dissection, (ii) Sample clearing, (iii) Image acquisition, (iv) 
Sinogram transformation, (v) 3D reconstruction and segmentation. 

(B) OPT series of coronal sections of inguinal fat organ after 3D reconstruction. 

(C) Orthogonal 400-[xm OPT slabs of inguinal fat in coronal, axial, and sagittal views. Axon bundles were identified based on the gray threshold level (arrows). 

(D) 3D reconstruction in maximal intensity projection of the OPT coronal sections. 

(E) Surface view of segmented structures within inguinal fat. 



The neural bundles were micro-dissected from the subcutane- 
ous fat pads and subjected to immunostaining for tyrosine hy- 
droxylase (TH), a marker of sympathetic neurons, and (33-tubulin 
(Tub-3), a general marker for the peripheral nervous system (PNS) 
(Figure 3A). We found that, overall, ^50% of the Tub-3-positive 
neurons also expressed TH, thus establishing the presence of 
both catecholaminergic and non-catecholaminergic axons inner- 
vating subcutaneous fat pads (Figure 3A). Then, we used multi- 
photon microscopy on the intact inguinal WAT of a living mouse 
to visualize sympathetic neuro-adipose connections (Figures 
3B and 3C; Experimental Procedures). We labeled adipocytes 
with LipidTOX, a lipophylic dye, and sympathetic axons by 
crossing TH Cre-recombinase mice ( TH-Cre ) with aTdtomato-re- 
porter line ( Rosa26-LSL-Tdtomato ) (Figure 3C). We observed that 
Tdtomato-positive axons in fat pads made dense contacts with 
adipocytes through bouton-like structures that had the anatomic 
appearance of neuro-adipocyte junctions, resembling synapses 
(Figure 3C). We quantified these from eight independent two- 
photon micrographs and determined that 8% ± 4.6% of adipo- 
cytes are in direct contact with sympathetic nerves. 



Optogenetic Stimulation of Sympathetic Inputs to WAT 
Leads to Catecholamine Release, HSL Phosphorylation, 
and Fat Mass Depletion 

We assessed the function of the catecholaminergic fibers by 
crossing the TH-Cre mice to a channelrhodopsin (ChR2) reporter 
line, Rosa26-LSL-ChR2-YFP. ChR2-YFP (yellow fluorescent 
protein) showed a complete co-localization with the endogenous 
TH, as determined by immunostaining of YFP and TH (Figure 4A). 
ChR2-YFP-expressing axons that projected onto subcutaneous 
WAT were then optogenetically activated using a subcutane- 
ously implanted optical fiber targeting the right inguinal fat depot 
(see Experimental Procedures for surgical details). 

While optogenetic tools have been widely used in the CNS, it 
has not been used as frequently to probe the function of periph- 
eral cells, including sympathetic neurons. We began by vali- 
dating the use of optogenetic stimulation of sympathetic neurons 
in primary cultures of superior cervical ganglia (SCG) of TH-Cre X 
Rosa26-LSL-ChR2-YFP mice; SCG can be dissected with less 
difficulty compared to other sympathetic ganglia (see Supple- 
mental Experimental Procedures for culture details). We found 



86 Cell 163, 84-94, September 24, 201 5 ©201 5 Elsevier Inc. 






Cell 



A 



B 



Objective 




inlet 



Figure 3. Catecholaminergic Neurons 
Innervating Adipocytes Integrate Nerve 
Bundles of Mixed Molecular Identity 

(A) Partial co-localization of TH (red), an SNS 
marker, and Tub-3 (green), a general PNS marker, 
shown by immunohistochemistry of nerve bundles 
dissected from the inguinal fat pads of WT mice. 
Scale bars, 50 i^m. 

(B) Schematic representation of the two- 
photon intra-vital imaging of neurons in the 
inguinal fat pad. 

(C) Intra-vital two-photon microscopy visualization 
of a neuro-adipose connection in the inguinal 
fat pad of a live TH-Cre; LSLS-Tomato mouse; 
LipidTOX (green) labels adipocytes. Scale bar, 
100 urn. 




that optogenetic stimulation of cultured sympathetic neurons 
increased expression of c-Fos, a marker for neuronal activity, 
in TH-positive cells and significantly stimulated NE release 
ex vivo, as assayed with ELISA (Figures 4B and 4C). NE release 
of ChR2-positive neurons was significantly higher relative to that 
of ChR2-negative cells (749.6 ± 1 70.1 pg/ml versus 4.8 ± 1 .7 pg/ 
ml, p < 0.05) (Figures 4B and 4C; see Experimental Procedures 
for culture and stimulation details). 

Next, we stimulated ChR2-YFP-expressing axons in vivo 
unilaterally by placing optical fibers subcutaneously, aiming at 
inguinal fat pads located in the supra-pelvic flank of TH-Cre X 
Rosa26-LSL-ChR2-YFP mice (see Experimental Procedures for 
stimulation details). Activation of the ChR2-positive axons in 
subcutaneous WAT led to a significant increase of NE in the stim- 
ulated fat pad, relative to the contralateral un-stimulated control 
side (2.7 ± 0.5 versus 1.1 ± 0.2, p < 0.05; Figure 4D). We also 
observed a significant increase of HSL phosphorylation of fat 
on the side ipsilateral to the optical fiber, compared to the 
contralateral un-stimulated side (Figure 4E). These data show 
that local activation of catecholaminergic inputs to fat could 
locally mimic the biochemical effect of leptin (Figures 4D and 
4E). Then, we tested whether a more prolonged (4-week) opto- 
genetic stimulation of ChR2-positive neurons in WAT could 
deplete fat mass (Figure 4F). The optical stimulation protocol 
was set to deliver light for every other second at 20 Hz, and 



the volume of subcutaneous WAT was 
determined using MRI with 3D recon- 
struction (Figures 4F and 4G; see Experi- 
mental Procedures for details). After 
chronic activation, the size of the optoge- 
netically stimulated ipsilateral fat pads 
of TH-Cre ; Rosa26-LSL-ChR2-YFP mice 
was 23% ± 3.4% that of the contralateral 
control side, representing a statistically 
and biologically significant decrease 
in fat mass (Figures 4F and 4G, p < 
0.0001). This effect depended on ChR2 
expression, as the fat pad volume of stim- 
ulated fat pads in ChR2-negative mice 
was unchanged (86% ± 4.3% of the size 
of the contralateral control fat pad), ruling 
out a potential nonspecific effect of laser stimulation (Figure 4G; 
see Experimental Procedures for details). Together, the results 
provide anatomical and functional evidence that there are syn- 
apse-like sympathetic inputs onto white adipocytes and that 
their activation is sufficient to promote local NE release, HSL 
phosphorylation, and a reduction in the mass of an adipose 
tissue depot. 

Local Sympathetic Inputs Are Required for 
Leptin-Stimulated HSL Lipolysis in WAT 

Similarly to optogenetic stimulation of sympathetic innervation 
in white fat, leptin treatment led to an increase in NE levels in 
the subcutaneous adipose organ. NE levels in WATs dissected 
from leptin-treated animals were significantly higher than those 
in controls (78.7 ± 16.8 pg NE/jig of protein versus 30.7 ± 4.1 
pg NE/|ig of protein, p < 0.05; Figure 5A). Interestingly, leptin 
treatment did not affect serum NE levels (Figure 5B), indicating 
that leptin locally increases NE release in white fat, but not 
systemically. 

Next, we evaluated whether sympathetic activation is neces- 
sary for leptin-stimulated lipolysis by disrupting the neural inputs 
in WAT, using a pharmacologic blockade or local genetic abla- 
tion. First, we observed that administration of hexamethonium, 
a non-depolarizing anti-cholinergic ganglion blocker, signifi- 
cantly decreased the leptin-stimulated phosphorylation of HSL 
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Figure 4. Optogenetic Stimulation of SNS Neurons in Fat Is Sufficient to Drive Lipolysis 

(A) Complete co-localization of YFP (green) and TH (red) shown by immunohistochemistry of nerve bundles dissected from the inguinal fat pads of TH-Cre; LSLS- 
ChR2-YFP. Scale bars, 50 [xm. 

(B) c-Fos (green) induction in cultured SNS neurons after optogenetic activation. Scale bars, 10 [xm. 

(C) Ex vivo NE release upon optogenetic stimulation of sympathetic SCG explants isolated from TH-Cre; LSL-ChR2-YFP mice and LSL-ChR2-YFP control mice 
(*p < 0.05; n = 3-6). Results are shown as mean ± SEM. 

(D) In vivo NE release in subcutaneous fat upon optogenetic stimulation of sympathetic neurons in WATs of TH-Cre; LSL-ChR2-YFP and LSL-ChR2-YFP control 
mice that were subcutaneously implanted with optical fibers targeting the inguinal fat pad (*p < 0.05; n = 8). Results are shown as mean ± SEM. 

(E) Immunoblot analysis of p-HSL in total protein extracts of subcutaneous fats of TH-Cre; LSL-ChR2-YFP and LSL-ChR2-YFP control mice that were subcu- 
taneously implanted with optical fibers targeting the inguinal fat pad and optogenetically stimulated for 2 weeks (details are given in Experimental Procedures). 

(F) MRI-guided visualization of fat in TH-Cre; LSL-ChR2-YFP and LSL-ChR2-YFP control mice that were optogenetically stimulated for 4 weeks (yellow indicates 
control inguinal fat pat, blue indicates light-stimulated fat pad; details are given in Experimental Procedures). 

(G) Quantification of fat reduction in stimulated side versus the contralateral control side (****p < 0.0001 ; n = 6). Results are shown as mean ± SEM. 

See also Movies SI and S2. 



in adipose tissue (Figure 5C). However, as the action of hexame- 
thonium is systemic and is not cell type specific, affecting all 
ganglionic transmission, we took a complementary approach 
by introducing a local neural crush injury to the fibers innervating 
epididymal fat pads. Because of the anatomy of the fat pad, 
nerve fibers in the distal portion of the tissues can be efficiently 
eliminated by a surgical crush of the perivascular axons running 
parallel to the main vessels (see Experimental Procedures). We 
carried out physical denervation with a forcep crushing the fibers 
2 mm from the distal tip for 30 s. Leptin was delivered 3 days 
post-surgery through osmotic pump for 2 days. Consistent 
with the effect of hexamethonium, after a crush injury to the local 
nerve, leptin treatment failed to increase HSL phosphorylation on 
the denervated side compared to the intact contralateral control 



(Figure 5D). This showed that local neural activation to WAT 
is required for the biochemical changes associated with leptin 
treatment. 

To confirm that leptin-mediated lipolysis is the result of activa- 
tion of sympathetic neural outputs to fat, we ablated these 
neurons by crossing the TH-Cre line with the diphtheria toxin re- 
ceptor (DTR) mice, Rosa26-LSL-DTR, and injected diphtheria 
toxin (DT) locally in subcutaneous inguinal WAT (Buch et al., 
2005). Local treatment with DT eliminated only those sympa- 
thetic axons in the regions of the injection site, without effects 
on other local neuronal populations as shown by the sparing 
TH-negative Tub-3-positive axons at the site of injection (Fig- 
ure 5E, p < 0.001 ; see Supplemental Experimental Procedures 
for details). These injections were administered peripherally at 
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Figure 5. Sympathetic Neurons Are Locally Required for Leptin-lnduced Lipolysis 

(A and B) C57BL6/J mice were peripherally administrated with 500 ng/hr recombinant leptin or saline for 2 days. (A) NE content in subcutaneous fat pads (*p < 
0.05; n = 5) and (B) NE serum levels were measured by NE ELISA (n = 4). Results are shown as mean ± SEM. ns, not significant. 

(C and D) C57BL6/J mice were peripherally administrated with 500 ng/hr recombinant leptin (C) in combination with 500 ^ig/hr ganglionic blocker hexamethonium 
(Hexam.) or (D) at 3 days after local crush injury of nerves in fat pads. p-HSL in total protein extracts of epididymal fats were examined by immunoblot analysis. 

(E) Fat pads in TH-Cre; LSL-DTR mice were locally treated with DT. Tissue-specific ablation of SNS axons confirmed by immunostaining for Tub-3 and TH (**p < 
0.001 ; n = 6). Results are shown as mean ± SEM. 

(F) Immunoblot analysis of p-HSL in total protein extracts of subcutaneous fats of TH-Cre; LSL-DTR and control mice injected with DT following leptin treatment 
(500 ng/hr). 

See also Figure SI . 



low doses (1 0 ng/g) to ensure that the effect was local and to also 
spare TH-positive neurons in CNS (Figure SI; Domingos et al., 
2013). Genetic ablation of sympathetic input to adipose tissue 
completely blocked the effect of leptin on HSL phosphorylation 
on the ipsilateral compared to the contralateral untreated side 
(Figure 5F). Together, these results demonstrate that activation 
of sympathetic neurons in fat is necessary for leptin to stimulate 
HSL phosporylation in adipose tissue. 

(3-Adrenergic Signaling Influences Leptin-Stimulated 
Lipolysis in WAT 

Consistent with a sympathetic mechanism for the leptin-medi- 
ated stimulation of lipolysis, systemic administration of the 
p-adrenergic agonist isoproterenol resulted in the rapid induc- 
tion of p-HSL. As previously reported, isoproterenol also 
increased FFA release from WAT in vitro and in vivo (Figure S2). 
Therefore, we set out to test whether p-adrenergic signaling was 
required for leptin-stimulated lipolysis. 

We first examined the lipolytic response to leptin in mice with a 
knockout (KO) of dopamine p-hydroxylase (i DBH ), a key enzyme 
in the synthesis of NE and epinephrine from dopamine (Figure 6). 



After peripheral administration of leptin, there was a dramatic 
increase of HSL phosphorylation in WAT in the WT or DBH +/+ 
animals but a markedly diminished response in the D/3/-T 7- 
littermates (Figure 6A). Consistent with this, the total fat compo- 
sition dramatically decreased in the WT mice treated with 
leptin, while there was only a slight change of fat mass in mice 
with the DBH deletion (p < 0.05; Figure 6B). Also consistent 
with a diminished lipolytic effect of leptin, there was also a lower 
amount of weight loss in the DBH -7- mice (Figure 6C). After 
2 days of leptin treatment, the body weight of WT mice 
decreased more than 6%, while the weight loss of DBH~ 
was less than 2% (p < 0.05). The data suggest that catechol- 
amines contribute to more than 50% of leptin’s effect on body 
weight. Altogether, the results confirm that catecholamines are 
required for the leptin-stimulated lipolysis and HSL phosphoryla- 
tion in WAT. 

To test whether the action of leptin to drive lipolysis is medi- 
ated through p-adrenergic signaling, we crossed the p1/p2 
double-KO to p3 KO to generate animals with a deletion of all 
three p-adrenergic receptors (Figure 7). Animals with a deletion 
of all three isoforms of the p-adrenergic receptors showed 
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Figure 6. Norepinephrine Deficiency Im- 
pairs Leptin-lnduced Lipolysis 

(A) Immunoblot analysis of p-HSL in total protein 
extracts of fat pads of dopamine (3-monoxygenase 
mutant and control littermates (DBH 1 and 
DBH +/+ , respectively) mice that were treated with 
500 ng/hr recombinant leptin. 

(B) Whole-body fat composition (*p < 0.05). 
Results are shown as mean ± SEM (n = 4-5). 

(C) Body weight change after leptin treatment (*p < 
0.05). Results are shown as mean ± SEM (n = 5). 




significantly decreased HSL phosphorylation after leptin treat- 
ment in comparison to the double-KO controls (Figure 7 A). How- 
ever, this decrease was not as marked as that seen in individual 
depots after local ablation of sympathetic fibers using DT (Fig- 
ure 5F). While leptin treatment significantly decreased total fat 
mass in ^1~'~^2~ / ~ mice, this effect was significantly reduced 
in (31^(32^(33 triple-KO mice (Figure 7B; paired ANOVA 
post hoc test, p < 0.01, comparing (31~ / ~^ / ~^3~ / ~ with 
i Q1~ / ~p2~ / ~p3 +/+ mice after leptin treatment). In contrast, 
a-adrenergic receptors appeared to play only a minor role in 
the leptin-stimulated loss of fat mass, because the a-adrenergic 
blockers phentolamine (5 mg/kg, intraperitoneally [i.p.]) and phe- 
noxybenzamine (10 mg/kg i.p.) failed to diminish the catabolic 
responses to leptin treatment in /?7 -/- /?2 -/- /?3 +/+ control mice 
or ^1~ / ~^2~ / ~^3~ / ~ mice (Figure 7C). There was also a small 
suppression of body weight loss in response to leptin in the 
/ 31~ / ~(32~ / ~^3 mice (Figure 7D). These results showed that 
the p-adrenergic receptors are only partially necessary for lep- 
tin-mediated lipolysis of WAT but that the magnitude of the effect 
of a loss of (3-adrenergic signaling is not as great as that 
observed by interfering with local neural outputs, thus suggest- 
ing that there could also be other neural mediators or interacting 
receptors on adipocytes (Figure 5). 

DISCUSSION 

Leptin is known to stimulate lipolysis and reduce fat mass, 
though the physiologic mechanisms responsible for this have 
not been fully delineated. In this study, we present data using 
functional, anatomic, biochemical, and genetic approaches to 
show that leptin increases lipolysis via the actions of sympathetic 
neuronal efferents to adipose tissue. These data also provide 
molecular, cellular, and anatomic evidence confirming the exis- 



dbh +/+ saline tence of neuronal projections onto adipo- 

dbh +/+ L eptin cytes, which have been the subject of 

dbh-'- saline conjecture but which have not been 

dbh' Leptin directly visualized. 

The existence of neuro-adipose junc- 
tions in WAT had been inferred based 
on the fact that a pseudorabies retro- 
grade-tracing virus can visualize a set of 
neural projections in the brain (Bartness 
and Song, 2007). In addition, immunohis- 
tochemistry, and immunofluorescence 
have been used to visualize contacts be- 
tween sympathetic neurons and adipocytes in sliced tissue 
(Giordano et al., 1996, 2005; Thompson, 1986). However, these 
methods, which require tissue slicing and fixation, do not distin- 
guish en passant neurons from those that directly project onto 
adipocytes. The visualization of adipocyte-projecting neurons 
that can completely envelop an adipocyte has not been accom- 
plished so far. We were able to directly visualize neural termini 
onto adipocytes using intra-vital multiphoton microscopy, which 
allows deep penetrance onto the live intact tissue, allowing us to 
visualize deeper structures without the perturbations associated 
with classical histological methods, which, in past studies, may 
have compromised the integrity of the neuro-adipose termini 
(Helmchen and Denk, 2005). 

Confocal or multiphoton microscopy methods are suitable for 
histological analysis at a microscopic spatial scale, but do not 
give a 3D perspective of the organization of the organ as a whole. 
At a macroscopic spatial scale, methods such as MRI or 
computed tomography (CT) allow for measurement of whole- 
body fat distribution. However, all of these methods lack the 
spatial resolution that is required for visualizing structures such 
as nerve bundles. OPT is a technique with physical principles 
similar to those of X-ray CT/gamma radiation, which uses visible 
light instead of radiation (Gualda et al., 2013). Scattering of light 
passing through tissues is minimized by clearing lipids from the 
whole organ (Quintana and Sharpe, 2011). Unlike most currently 
available methods, OPT coupled to tissue clearing allows imag- 
ing of whole-mount samples with a spatial scale in the order of 
centimeters. 

It has been previously shown that electrical stimulation of WAT 
nerve bundles can drive lipolysis (Correll, 1963). However, as 
shown here, nerve bundles in WAT have mixed molecular iden- 
tity, making it difficult to ascertain the identity of the neurons 
responsible for this effect. To address this limitation, we used 
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Figure 7. Deficiency of All p-Adrenergic Receptors Influences Leptin-lnduced Lipolysis 

(A) Immunoblot analysis of p-HSL in total protein extracts of fat pads of ^1~ , ~^2~ , ~^3 +/+ and / Q1~'~ ^2~'~ ^3~'^ mice that were treated with 500 ng/hr recombinant 
leptin. 

(B) Whole-body fat composition (*p < 0.05, n = 4-5). Results are shown as mean ± SEM. 

(C) a-Adrenergic receptors had a minor function in leptin-induced lipolysis (*p > 0.05, n = 4-5). Results are shown as mean ± SEM. ns, not significant. 

(D) Whole-body fat composition of mice peripherally treated with recombinant leptin and a-blockers (phentolamine, 5 mg/kg, i.p.; and phenoxybenzamine, 
10 mg/kg, i.p.) was measured (n = 4-5). Results are shown as mean ± SEM. 

See also Figure S2. 



optogenetics, which allows for cell-specific stimulation of neu- 
rons (Domingos et al. , 2011, 2013). In the present study, we 
used optogenetics to specifically activate sympathetic neurons 
in TH-Cre mice. Another advantage of this approach is that it en- 
ables the specific activation of axonal projections and does not 
require stimulation of neuronal cell bodies (Petreanu et al., 
2007; Vrontou et al., 2013). This feature is particularly convenient 
for autonomic neurons, given the deep localization of their cell 
bodies along the anterior face of the spinal cord and the intrinsic 
difficulty of implanting optical fibers in this location. Previous 
studies using neural tracing have revealed that sympathetic neu- 
rons innervating the subcutaneous inguinal fat pads localize 
to the 13th thoracic ganglia, which localizes at the dorsal edge 
of the diaphragm muscle, in the transition between the thorax 
and the abdomen (Youngstrom and Bartness, 1995). This 
anatomical location is particularly inaccessible and unsuitable 
for chronic implants of optical fibers or equivalent devices. How- 
ever, as we show here, subcutaneous implant of optical fibers for 
stimulation of nerve terminals is feasible and effective. We used 
this approach to show that optogenetic gain of function of the 
catecholaminergic signaling to the neuro-adipose junction can 
lead to the phosphorylation of HSL and lipolysis of WAT. Similarly, 
previous loss-of-function experiments that assessed the effect of 



sympathetic input on lipolysis also did not allow analysis of the 
effect of specific cells in the way that optogenetics can. Thus, 
the use of mechanical denervation does not distinguish between 
neurons that directly innervate adipocytes versus those that are 
passing through. We have not yet profiled the non-TH nerve fi- 
bers, but it is reasonable to expect that some might be parasym- 
pathetic, nociceptive, sensory fibers and/or en passant axons. 
The function of these fibers could be assessed similarly to those 
that we report, using optogenetics, to activate other populations, 
including cholinergic neurons, by studying choline acetyltransfer- 
ase (ChAT)-Cre mice and/or neurons expressing other molecular 
markers. Chemical ablation with capsaicin also has limitations, as 
this treatment is not specific to sympathetic neurons and affects 
all transient-receptor-potencial-vaniloid (TRPV)-expressing fi- 
bers (Holzer, 1991). Chemical ablation with 6-hydroxydopamine 
is likely to affect dopaminergic as well as enteric neurons, 
creating secondary systemic effects (Ding et al., 2004). 

To avoid these limitations and gain local control over sympa- 
thetic neural activity, we used additional molecular genetic tools 
that combine tissue specificity with a localized effect. We show 
that ablation of the sympathetic neurons by DTR expression in 
TH-positive neurons followed by local DT injection in WAT abol- 
ishes the effect of leptin on HSL phosphorylation. We also noted 
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that the loss of function due to post-synaptic manipulations (i.e., 
the triple |3-adrenergic receptor KO in Figure 7) has a lesser effect 
on the size of adipose tissue depots than that seen after pre-syn- 
aptic manipulations such as a loss of DBH or pharmacologic or 
mechanical ablation of neural input to fat (Figure 6). This sug- 
gests that leptin-induced production of NE from sympathetic 
neurons could act through additional receptors that are not 
one of the three p-adrenergic receptors that we tested, which 
have been suggested by others (Tavernier et al., 2005). Alterna- 
tively, sympathetic neurons may co-release other neurotrans- 
mitters or neuropeptides that signal through non-adrenergic 
receptors. This could account for the residual leptin-induced 
weight loss seen in DBH -7- , although this effect is not significant 
when compared to controls. Thus, an in-depth knowledge of the 
underlying sympathetic neural circuits could provide strategies 
to pharmacologically activate specific sympathetic neuronal 
population, thus circumventing leptin resistance, as potential 
treatment of obesity. 

A canonical effect of leptin is to increase sympathetic signaling 
to BAT, thus promoting thermogenesis (Bachman et al., 2002; 
Landsberg et al., 1984). However the role of autonomic stimula- 
tion of white fat has been less well studied. We now show that the 
sympathetic activity is also responsible for the leptin-stimulated 
lipolysis in WAT. While leptin has been assumed to increase lipol- 
ysis via activation of sympathetic efferent fibers, this has not 
been directly shown. Adrenergic agonists have been shown to 
induce the formation of beige (brite) fat, and these data suggest 
that sympathetic innervation may also stimulate this phenotypic 
change in adipose tissue (Bachman et al., 2002; Giralt and Villar- 
roya, 201 3; Dempersmier et al., 201 5). Consistent with this, leptin 
has also been suggested to increase the formation of beige fat 
(Dodd et al., 2015). 

Because brown adipocytes have relatively smaller fat storages 
compared to white adipocytes, while having higher metabolic 
demand, the continuous thermogenic response of BAT might 
require the supply of FFA from WAT mobilized in other parts of 
the body. Therefore, it is reasonable to speculate that the coor- 
dinated sympathetic actions in BAT and WAT in response to 
leptin could help maximize the hormone’s effect on energy 
expenditure and fat metabolism. Future studies delineating the 
neural circuits connecting the central action of leptin with the 
peripheral activation of sympathetic system will be necessary 
to test this hypothesis. Particularly, it would be of great impor- 
tance to develop technologies that would allow whole-body 
visualization and mapping of peripheral neuronal circuits using 
some of the approaches presented here. 

In summary, we provide direct evidence that the sympathetic 
neuro-adipose junction is both necessary and sufficient for leptin 
to drive lipolysis in WAT. 

EXPERIMENTAL PROCEDURES 
Antibodies and Drugs 

The antibodies were obtained from the following vendors: HSL (Cell Signaling 
Technology), phospho-HSL (Cell Signaling Technology), phospho-PKA sub- 
strate (Cell Signaling Technology), TH (Pel-Freez Biologicals), and actin 
(Sigma). Hexamethonium chloride, phentolamine, and phenoxybenzamine 
were from Sigma-Aldrich. DT was purchased from Merck Millipore. Recombi- 
nant mouse leptin was obtained from Amylin Pharmaceuticals. 



Mice 

DBH KO mice were kindly provided by Steve Thomas at the University of 
Pennsylvania. Adr(31~ / ~2^ / ~ and Adr^3~'~ were kindly provided by Bruce Spie- 
gelman at Harvard Medical School. TH-Cre, Rosa26-LSL-ChR2-YFP (Stock No. 
012-569; Daou et al., 2013), Rosa26-LSL-DTR, and C57BL/B6J mice at 
6-10 weeks old were purchased from The Jackson Laboratory. Animal proce- 
dures were approved by the ethics committee of Instituto Gulbenkian de Ciencia 
and the Institutional Animal Care and Use Committee of Rockefeller University. 

OPT 

Six-week-old C57BL/6 mice were sacrificed with carbon dioxide. The inguinal 
fat pads were dissected from the mice with Dumont #5 Forceps, fixed in 4% 
paraformaldehyde (PFA; Sigma-Aldrich) for 3 hr at room temperature (RT) 
and subjected to the OPT clearing protocol as described in the Supplemental 
Experimental Procedures. Images of the whole fat tissue were acquired using 
a 1 x lens mounted on an Infinitube tube lens and projected into a Hamamatsu 
FlashLT sCMOS camera. A total of 1,600 images were acquired for a full 
rotation (0.25° steps). The series of projections were then pre-processed for 
back-projection using FIJI in order to remove hot pixels and re-align the axis 
of rotation in relation to the camera chip, and finally the back-projection recon- 
struction was conducted using the Skyscan’s NRecon software (Schindelin 
et al., 2012). The stack of slices was further processed with FIJI to increase 
contrast and saved to posterior analysis with the software Amira V5.3. Using 
this software, 3D reconstructions and image segmentation were performed 
to identify and reconstruct individual parts of the fat organs. Detailed instruc- 
tions for setting up an OPT system can be found at https://sites.google.com/ 
site/openspinmicroscopy/home/opt. 

In Vivo Two-Photon Microscopy 

Two-month-old mice were kept anesthetized with 2% isofluorane. During sur- 
gery, body temperature was maintained at 37°C with a warming pad. After 
application of local anesthetics (lidocaine), a sagittal incision of the skin was 
made above the supra-pelvic flank to expose the subcutaneous inguinal fat 
pad. An imaging chamber was custom built to minimize fat movement. Warm 
imaging solution (in millimolar: 130 NaCI, 3 KCI, 2.5 CaCI 2 , 0.6 MgCI 2 • 6H 2 0, 
10 HEPES without Na, 1.2 NaHC0 3 , 10 glucose (pH 7.45), with NaOH) (37°C) 
mixed with a fat dye (LipidTOX) was applied to label adipocytes, maintain tissue 
integrity, and allow the use of immersion objective. Imaging experiments were 
performed under a two-photon laser-scanning microscope (Ultima, Prairie In- 
struments). Live images were acquired at 8-12 frames per second, at depths 
below the surface ranging from 100 to 250 mm, using an Olympus 20 x 0.8 
N.A. water immersion objective, with a laser tuned to 860-940 nm wavelength, 
and emission filters 525/50 nm and 595/50 nm for green and red fluorescence, 
respectively. Laser power was adjusted to be 20-25 mW at the focal plane 
(maximally, 35 mW), depending on the imaging depth and level of expression 
of dtTomato and LipidTOX spread. tdTomato fluorescence was used to identify 
TH-positive fibers until photobleaching occurred. 

Leptin Treatment and Lipolysis Analysis 

To examine the effect of leptin treatment on lipolysis, leptin (delivery rate of 
500 ng/hr) or saline was delivered through osmotic pumps (Alzet) subcutane- 
ously for 2 days. Body weight was recorded daily. Body fat composition was 
measured using the EchoMRI body analyzer at end point before subcutaneous 
or epididymal adipose tissues were collected. HSL phosphorylation was 
detected by immunohistochemistry of paraffin sections at 6-^im thickness 
and/or western blot of subcutaneous or epididymal adipose tissues. NE levels 
in serum and subcutaneous fat pads were determined with an NE ELISA kit 
(Labor Diagnostika Nord GmbH). Tissues were homogenized and sonicated 
in homogenization buffer (1 N HCI, 1 mM EDTA), and cellular debris was pel- 
leted by centrifugation at 13,000 rpm for 15 min at 4°C. All tissue samples 
were normalized to total tissue protein concentration. 

Mechanical Denervation 

The nerve bundle 2 mm distal from the tip of the epididymal fat was physically 
crushed for 30 s and then released using a forcep. Leptin was administrated 
through osmotic pump 3 days after nerve crush. HSL phosphorylation was 
detected 2 days upon leptin treatment. 
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Hexamethonium Chloride, Isopreterenol, and a-Blocker Treatment 

Hexamethonium chloride (500 fxg/hr) was administrated during leptin treat- 
ment through a separate osmotic pump, and the a-blockers phentolamine 
(5 mg/kg, intraperitoneally [i.p.]) and phenoxybenzamine (10 mg/kg; i.p.) 
were administrated twice a day during the course of leptin treatment 
(500 ng/hr). Isopreterenol (250 [xg per mouse) was delivered in saline through 
jugular vein injection. Blood was drawn through tail bleeding, and plasma 
FFA was measured using a NETO kit (Wako Pure Chemicals Industries). For 
FFA release upon isopreterenol treatment in vitro, adipose tissue explants 
were dissected and cultured in Hank’s medium and stimulated with isoprete- 
nerol at 10 |xg/ml for 3 hr. FFA was measured in the medium. 

NE Measurements after Optogenetic Stimulation Ex Vivo 

SCG were removed from 28- to 30-day-old TH-Cre X Rosa26-LSL-ChR2-YFP 
mice under a stereomicroscope and placed in DMEM (Invitrogen). Ganglia 
were cleaned from the surrounding tissue capsule and transferred into eight- 
well tissue culture chambers (Sarstedt) that were previously coated with 
poly-D-lysine (Sigma-Aldrich) in accordance to the manufacturer’s instruc- 
tions. Ganglia were then covered with 5 [xl Matrigel (BD Biosciences) and incu- 
bated for 7 min at 37°C. DMEM without phenol red (Invitrogen) supplemented 
with 10% fetal bovine serum (Invitrogen), 2 mM L-glutamine (Biowest), and 
nerve growth factor (Sigma-Aldrich) was subsequently added. SCG ganglia 
were cultured for a minimum of 24 hr prior to further manipulation. Depolariza- 
tion of sympathetic neurons in explant cultures was performed on a Yokogawa 
CSU-X1 Spinning Disk confocal system using the 488-nm laser line and point- 
ing at the region of interest (ROI) for 200 [xs. Stimulation was repeated five 
times using 40% of laser intensity. NE in the SCG explant culture medium 
was determined with an NE ELISA kit (Labor Diagnostika Nord GmbH). The 
same procedure was performed for LSL-ChR2-YFP control mice. 

Surgeries and Optogenetic Stimulation 

General anesthesia was induced and maintained with isofluorane. After appli- 
cation of local anesthetics (lidocaine), a sagittal incision of the skin was made 
above the neck and supra-pelvic flank. A hemostat was inserted into the inci- 
sion and, by opening and closing the jaws of the hemostat, spread the subcu- 
taneous tissue to create a longitudinal pocket for the optical fiber. The pocket 
was made long enough to allow about 4-6 cm of fiber (Thorlabs FT200). The 
tip of the fiber targeted the anatomical location of the inguinal fat pad. The 
other end of the fiber, the ferrule-connector end, was secured along the skin 
via sutures and dermal staples. Appropriate local analgesic was used post- 
surgically. Optogenetical stimulations were performed 48 hr after surgical 
procedures. 

The stimulation session in Figure 4D lasted 4-6 hr and was performed via 
a 1-s 20-Hz pulse of blue laser every other second, originating from a 473- 
nm solid laser source (OEM-BL-473-00100-CWM-SD-05). The laser source 
had an output power of 100 mW. Ferrule-coupled optical fibers of 200-[xm 
diameter (Thorlabs; FT200EMT -CAN N U LA-TS1 031 629) were connected to 
ferrule patch cords (Thorlabs; FT200EMT-FC/PC-ferrule) with mating sleeves 
(Thorlabs; ADAF1), and the later to the laser source via FC/PC adaptor. 

NE in subcutaneous fad pads was determined with an NE ELISA kit (Labor 
Diagnostika Nord GmbH) as described earlier. 

Stimulation protocol in Figure 4E took place every day for 2 weeks and solely 
during the rodent rest period. Longer sessions, as in Figure 4F, had a duration 
of 4 weeks. Stimulation sessions lasted 4-6 hr and were performed as 
described earlier. 

MRI Fat Measurements and Fat Pad Segmentation 

Mice were subjected to optogenetic stimulation as stated earlier, perfused 
with 4% PFA/PBS, post-fixed over 2-3 days, and embedded in Fomblin Oil 
(Sigma-Aldrich) for scanning. Imaging was performed on a 7.0 T 70/30 Bruker 
Biospec small-animal MRI system with a 12-cm diameter 450 mT/m ampli- 
tude and 4,500 T/m/s slew rate actively shielded gradient subsystems with 
integrated shim capability. A linear coil with 7-cm diameter and a length suf- 
ficient to cover the whole body of the animal was used for excitation and 
reception of the magnetic resonance signal. Two image sets were acquired, 
one with fat suppression and one without. Axial images, covering the whole 
animal in 75 0.4-mm-thick slices without gap, were acquired in an interleaved 



way by using a RARE (rapid acquisition with relaxation enhancement) pulse 
sequence with RARE factor 2. Four averages with a flip angle of 90° — echo 
time (TE) = 10 ms, repetition time (TR) = 2,468 ms, field of view = 10 x 
3 cm, and matrix size = 256 x 128 (acquisition matrix size = 256 x 96), result- 
ing in a spatial resolution of 0.391 x 0.234 x 0.4 mm— were acquired. The fat 
suppression, added to the second scan, consists of a 90° Gaussian pulse 
with 2.6067-ms duration and 1051.1 -Hz bandwidth. Data were converted 
into .tif files by FIJI software. The subcutaneous inguinal fat distribution 
was determined with semi-automated Amira V5.3 software segmentation of 
scanned images. Amira V5.3 software segmentation relies on the automated 
grouping of pixels with the same index of intensity in the grayscale. An auto- 
matic segmentation based on the gray threshold levels, which decomposes 
the image domain into subsets, allowed us to define the right and the left 
inguinal fat depots, which were further saved as unique fields. Volumes of 
the right and the left subcutaneous fat pads were defined as the number of 
voxels multiplied by the size of a single voxel. The size of stimulated fat 
pads was determined, and the effect of optogenetic stimulation of neurons 
on fat mass was calculated in the same animal relative to non-stimulated 
contralateral side. 

Statistical Methods 

Statistics were performed in GraphPad Prism and involved the computation of 
means and SEM, which accompany each figure legend. Student’s t tests and 
ANOVAs were used where appropriate, and p values are indicated in text. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
two figures, and two movies and can be found with this article online at 
http://dx.doi.Org/10.1016/j.cell.2015.08.055. 
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SUMMARY 

To understand how different diets, the consumers’ 
gut microbiota, and the enteric nervous system 
(ENS) interact to regulate gut motility, we developed 
a gnotobiotic mouse model that mimics short-term 
dietary changes that happen when humans are trav- 
eling to places with different culinary traditions. 
Studying animals transplanted with the microbiota 
from humans representing diverse culinary traditions 
and fed a sequence of diets representing those of all 
donors, we found that correlations between bacterial 
species abundances and transit times are diet 
dependent. However, the levels of unconjugated 
bile acids — generated by bacterial bile salt hydro- 
lases (BSH) — correlated with faster transit, including 
during consumption of a Bangladeshi diet. Mice 
harboring a consortium of sequenced cultured bac- 
terial strains from the Bangladeshi donor’s micro- 
biota and fed a Bangladeshi diet revealed that the 
commonly used cholekinetic spice, turmeric, affects 
gut motility through a mechanism that reflects bacte- 
rial BSH activity and Ret signaling in the ENS. These 
results demonstrate how a single food ingredient in- 
teracts with a functional microbiota trait to regulate 
host physiology. 

INTRODUCTION 

Gut motility, a key physiologic parameter governing digestion 
and absorption of nutrients, is affected by diet (Cummings 
et al., 1976, 1978), gut microbes (Husebye et al., 1994, 2001; 
Wichmann et al., 2013), the enteric nervous system (ENS) (Edery 
et al., 1994; Romeo et al., 1994), and host genetics (Levy et al., 
2000; Whorwell et al 1986). At present, we lack detailed under- 
standing of the complex and dynamic interrelationships between 
these factors, particularly in the global context of diverse cultural 

CrossMark 



traditions concerning foods, their methods of preparation, and 
the varied human gut microbiota that have evolved under these 
dietary conditions. Intestinal transit times measured in >1,000 
healthy individuals representing diverse populations worldwide 
varied within and between groups, likely reflecting a combination 
of these factors (Burkitt et al., 1 972). The advent of culture-inde- 
pendent methods for characterizing the structure and expressed 
functions of a gut microbiota creates an opportunity to identify 
new approaches for understanding gut motility and optimizing 
the nutritional benefits derived from different dietary practices. 

In the present study, we began by modeling short-term diet 
changes associated with global human travel in gnotobiotic 
mice colonized with gut microbiota from healthy human donors 
from around the world. We hypothesized that with an interna- 
tional focus, we could conduct a screen of food types and 
microbiota for potential mediators of motility common to diverse 
diets and gut communities. Without making any assumptions 
regarding the healthiness of faster or slower motility, our strategy 
was to identify diets and microbiota whose interactions result in 
highly contrasting transit times in order to subsequently home 
in on specific dietary ingredients and microbiota-encoded meta- 
bolic capacities that affect motility. We reasoned that by gener- 
ating clonally arrayed collections of bacterial strains cultured 
from donor microbiota that transmitted disparate motility pheno- 
types in different dietary settings, we could deliberately manipu- 
late which members of the collection were used for colonization 
of mice based on specific metabolic attributes that could be iden- 
tified from their genomes and confirmed by direct biochemical 
assays in vitro. Colonization with different subsets of the commu- 
nity could then be performed in the context of concurrently 
manipulated diets, either in wild-type animals or in those with 
deliberately manipulated genetic features known to affect ENS 
function. Our immediate goal for this type of preclinical modeling 
was to decipher the mechanisms by which diet-by-microbiota 
interactions can regulate gut motility. Our long-term goal was to 
implement an approach that could, in principle, be generalized 
to dissect the effects of interactions between (1) ingredients rep- 
resented in established as well as emerging dietary traditions/ 
trends, (2) members of consumers’ gut microbiota, and (3) gut 
motility and potentially other aspects of human physiology. 

Cell 163 , 95-107, September 24, 2015 ©2015 Elsevier Inc. 95 





Cell 



Primal diet 


American diet 


Bangladeshi diet 


■ 

E 

CarjoLO 


Provolone cheeseBagelSg 

A Snack Bars! 

Lima Beans r 

Turkey breast HF Corn Syrup Ham ~ 

Tomato Pepperoni Cauliflower Flour 

Corn muffinTnix 

Strawberries 


■ Cucumber 

Rice 

Tilapia 


Malawian diet 


Amerindian diet 




Bananas^ 

Mustard greens 

Peanuts -*|Tomatoes 

Corn flour 

Pumpkin 


Cassava 

Corn flour I 

-= b/Q CD 

1 too o 

oUJ | 


■ Carbohydrates 

■ Proteins 

■ Fats 



Figure 1. Compositions of Diets Used in the 
Six-Phase Travel Experiment 

In the foreground, word clouds convey the specific 
ingredients used; font sizes depict the weight- 
based proportional representations of ingredients 
within each diet. Pie charts in the background 
represent the macromolecular compositions. See 
also Table SI . 



RESULTS 

Modeling Diet and Motility Changes Associated with 
Global Human Travel in Gnotobiotic Mice 

In an initial “travel” experiment, intact uncultured fecal microbiota 
samples obtained from six healthy adults representing different 
geographic locations and cultural/culinary traditions were trans- 
planted into adult germ-free C57BLV6 male mice (n = 6 recipient 
mice/donor microbiota). Microbiota donors included (1 ) three res- 
idents of the USA (a twin pair stably discordant for obesity with 
both co-twins consuming an American diet without self-imposed 
dietary restrictions {USA unrestricted ) (Ridaura et al., 2013) plus 
another lean individual who had consumed a protein- and fat- 
rich primal diet for a number of years (USA P r ima D, (2) an Amerindian 
living in a remote rural village in the Amazonas State of Venezuela 
(Yatsunenko et al., 2012), (3) a Bangladeshi resident of an urban 
slum (Subramanian et al., 2014), and (4) a Malawian from a rural 
village in the southern part of the country (Yatsunenko et al., 
2012) (see Table SI A for a description of donor characteristics 
and Table SIB for analysis of microbiota transplantation effi- 
ciency). The six groups of transplant recipients were fed a 
sequence of six sterilized diets formulated to represent those 
consumed by the microbiota donors (Figure 1; Tables SIC and 
SI D), in essence simulating the varying dietary experiences of hu- 
mans and their microbiota during travel. In each case, the initial 
and final diets in the sequence represented the native or home 
diet of the donor, in order to characterize the longer-term effects 
of dietary exposures during travel and the degree to which transit 
times recover. In between, travel diets were given in the same 
sequence, in an order chosen randomly but executed uniformly 
for all mice, as permitted by the type of home diet (Figure 2A). 
The starting and ending home diets were given for 14 and 
8 days, respectively, while each travel diet was administered for 
8 days. Intestinal transit times were measured at the end of 
each diet phase by gavaging mice with non-absorbable red 
carmine dye and recording the time from gavage to first appear- 
ance of the dye in their feces (Kashyap et al., 2013; Li et al., 201 1 ; 
Yano et al., 2015) (Figure 2B; Table S2A). Carmine dye does not 
perturb the structure of the gut bacterial community; 16S rRNA 
analysis of fecal samples, collected before and after carmine 
administration from 9-week-old gnotobiotic mice colonized with 
a fecal microbiota from a conventionally raised C57BL76 donor, 



showed no significant effect of the dye 
as judged by weighted UniFrac distances 
(p > 0.05, two-tailed Student’s t test). 
Moreover, fecal samples collected on the 
days of transit time measurements were 
taken prior to carmine administration. 

Aggregating data from all animals at all time points of this 
six-phase travel experiment revealed a normal distribution of 
transit times (Figure 2C). The average within-mouse variance 
throughout the experiment was 27.7 min, while the average be- 
tween-mouse variance at a given time point was 29.3 min. 
Repeated-measures ANOVA in which transit time was the 
dependent variable demonstrated that diet (p = 5.6 x 1 0 -5 ), 
the donor microbiota (p = 2.3 x 10 -3 ), and the interaction of 
diet and microbiota (p = 2.6 x 1CT 3 ) were all significant factors 
(Table S3A). The most contrasted diet-by-microbiota effects on 
transit times were documented in mice colonized with the Ban- 
gladeshi compared to the USA unrestricted microbiota when they 
consumed Bangladeshi and primal diets (Figure 2D). Specif- 
ically, mice colonized with the USA unrestricted microbiota had 
significantly faster motility (i.e., shorter transit times) when 
consuming the Bangladeshi diet compared to primal diet (p < 
0.002, two-tailed Student’s t test); the opposite was observed 
in mice colonized with a Bangladeshi microbiota (p < 0.006, 
two-tailed Student’s t test). We tested the robustness of these 
most contrasted motility phenotypes by colonizing animals 
with fecal microbiota obtained from three healthy Bangladeshi 
adults (including the donor tested in the first experiment) and 
three healthy USA unrestricted adults (a repeat of the obese co- 
twin in the discordant pair, plus a new obese and a new lean 
donor; see Table SI A for donor characteristics). Recipients 
(n = 5 mice/donor microbiota) were subjected to three diet 
phases beginning and ending with a “local” diet (i.e., the primal 
diet for mice colonized with a VSA unrestricted microbiota or the 
Bangladeshi diet for mice colonized with a Bangladeshi micro- 
biota) and including an interval “non-native” diet (Figure SI A). 
The results of this three-phase travel experiment confirmed 
that significantly contrasted transit times were imparted by the 
interactions of these diets and microbiota (p < 2.6 x 1CT 5 , 
F = 19.8, analysis of covariance [ANCOVA]), although the effect 
size and statistical significance of differences in transit time var- 
ied by the individual microbiota donor (Figure SIB; Table S2B). 

Correlations between the Relative Abundances of Gut 
Bacterial Strains and Transit Times Are Diet Dependent 

To identify relationships between specific bacterial taxa, diet, 
and transit time phenotypes, we sequenced PCR amplicons 
generated from the V4 region of bacterial 16S rRNA genes 
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Figure 2. Diet and Microbiota Significantly Impact Intestinal Transit Times 

(A) Schematic of the six-phase travel experiment. Groups of adult germ-free C57BL/6 mice were colonized with fecal microbiota from six healthy donors and fed 
diets representative of those consumed by all donors in the sequence shown in (B). 

(B) The central squares of this heatmap represent means of transit times for each diet-microbiota combination as measured by a carmine red dye assay; the 
frames of the squares represent SEM (n = 6 mice/donor microbiota). Microbiota are represented along rows and diet phases along columns. Each group of mice 
consumed human diets in the order shown from left to right; home diets were always consumed during the initial and final diet phases but skipped in the 
intervening travel diet progression (primal diet unrestricted American diet Bangladeshi diet Malawian diet Amerindian diet). 

(C) Histogram showing distribution of all transit times recorded throughout the six-phase travel experiment. 

(D) The most contrasted diet-by-microbiota effects on transit times were observed in mice colonized with a Bangladeshi or USA unrestricted microbiota and fed 
Bangladeshi versus primal diets. Results for VSA unrestr icted (lean) and DSA unrestricted (obese) were aggregated and are represented together as USA unrestricted- 
Results from the “home” and “return home” phases for mice colonized with a Bangladeshi microbiota and fed a Bangladeshi diet were also aggregated. 
Statistical significance was determined using a two-tailed Student’s t test; *p < 0.05. Within each box, the horizontal line denotes the mean value of the transit 
times. The lower and upper boundaries of each box represent the 25 th and 75 th percentiles, respectively, while whiskers represent 1 .5 times the interquartile 
range. 

See also Figures 1 , 3, SI , and S2 and Tables SI , S2, S3, S4, and S5. 



present in fecal microbiota collected throughout the course of 
the six-phase as well as three-phase travel experiments (984 
fecal samples; 22,470 ± 630 reads per sample [mean ± SEM]; 
Table S4). 16S rRNA reads were grouped into operational taxo- 
nomic units based on whether they shared >97% nucleotide 
sequence identity (97%ID OTUs). Principal coordinates analysis 
(PCoA) based on unweighted UniFrac, a phylogenetic metric that 
computes similarity between any two microbiota based on the 
degree to which their component OTUs share branch length on 
a bacterial tree (Lozupone and Knight, 2005), indicated that com- 
munity assembly was rapid and highly reproducible within a 
given group of mice that received the same donor microbiota 
in both the six-phase and three-phase travel experiments (Fig- 
ures S2A and S2C). 

The microbiota donor was the predominant factor explaining 
variance in unweighted UniFrac distances between samples 
from the different experimental groups (p < 0.001 within-group 
as compared to between-group similarity, permutational multivar- 
iate analysis of variance using distance matrices [PERMANOVA]; 
Figures S2B and S2D). Nonetheless, diet had consistent effects 
across different treatment groups and phenotypes: 87 diet- 
discriminatory 97%ID OTUs that were robust to donor microbiota 



and motility phenotypes were identified by applying a machine 
learning algorithm (Random Forests) to the 16S rRNA dataset 
generated from all fecal microbiota samples collected from all 
mouse recipients of all human donor microbiota throughout 
the six-phase travel experiment (Figure 3; Table S5). We elected 
to apply a decision-tree-based algorithm for feature selection 
(i.e., in this case, the most diet-discriminatory OTUs) so that 
we would not have to make any distributional assumptions 
regarding our dataset of proportional OTU abundances. The 
Random Forests-derived model predicted which diet was being 
consumed in the subsequent three-phase travel experiment 
with a mean accuracy of 83% ± 0.02% (range 79%-86%; 
10,000 replications), significantly better than the null distribution 
(p < 2.2 x 10 -16 ). These 87 diet-discriminatory OTUs were 
not significantly correlated with transit times across all diet-micro- 
biota combinations in either experiment. In an analysis of all 416 
97%ID OTUs with relative abundances above the limit of detec- 
tion (0.01 %) in mouse fecal samples collected throughout both 
travel experiments, just a single OTU, Parabacteroides gordonii 
(OTU ID 240), was significantly correlated, after Bonferroni correc- 
tion for multiple comparisons, with transit times across the highly 
contrasted diet-microbiota combinations (rho = 0.3, p = 0.02). This 
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Figure 3. Diet-Discriminatory OTUs Are 
Robust to Different Donor Microbiota and 
Motility Phenotypes 

(A) Out-of-bag (OOB) estimated error rates in a 
Random Forests model for predicting diet, stratified 
by donor microbiota, as a function of numbers of diet- 
discriminatory OTUs. For each microbiota, 40 OTUs 
were sufficient to discriminate diet, yielding a total of 
87 unique OTUs across six microbiota donors from 
five cultural/dietary traditions in the six-phase travel 
experiment. 

(B) Evidence for the robustness of diet-discriminatory 
OTUs to donor microbiota and motility phenotype. 
Feature importance scores of 87 diet-discriminatory 
OTUs in each diet-microbiota context are repre- 
sented in this heatmap, which was generated 
following unsupervised hierarchical clustering. A 
sparse Random Forests model built using these diet- 
discriminatory OTUs accurately predicted diet in the 
three-phase travel experiment. 

See also Table S5. 
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organism has not been reported to be associated with altered in- 
testinal motility in humans. 

We identified 27 OTUs present in both the USA unrestricted and 
Bangladeshi microbiota that had significant diet-dependent cor- 
relations with transit times documented in the context of either 
the Bangladeshi or primal diets, but not both (Table S3B). The 
relationships of bacteria to transit times were strain specific: 
two strains of Eubacterium desmolans (OTU IDs 170124 
and 158946) had opposing relationships with transit times 
within the setting of primal diet consumption. A single OTU, 
E. desmolans (OTU ID 158946), was significantly correlated 
with transit times in both diets, but remarkably, the correlations 
were opposite in the Bangladeshi versus primal diet contexts 
(p < 9.3 x 10 -6 , F = 9.6, ANCOVA testing interaction of 
E. desmolans abundance with diet). In the unrestricted USA 
diet context, yet another OTU in the CSA unrestricted and Bangla- 
deshi microbiota (( Clostridiales , OTU ID 261590) was correlated 
with transit times (rho = -0.57, p = 0.04). Hence, a next-genera- 
tion probiotic strain designed to impact motility may require 
simultaneous consumption of a specific diet or diet ingredient 
to exert its effect. 

Transit times at the ends of the initial and final home diets often 
varied. This finding could not be ascribed to increasing age 
(p = 0.7, F = 0.1 , one-way ANOVA of transit time as a function 
of age). Mathematically, if we consider two functions h(m) and 
t[m), then h(m) need not necessarily equal h(t(h[m))). We postu- 
lated that if these two functions represent the effects of 
home (h) and travel (t) diets, respectively, on a microbiota m, 
then the result of a home diet may depend on whether inter- 
vening travel diets were consumed. Indeed, an analysis 
comparing fecal microbiota at the ends of the initial and final 
home phases in the six-phase travel experiment disclosed that 
community structure, while largely similar, always exhibited 
statistically significant changes (p < 0.05, paired two-tailed Stu- 
dent’s t test after Bonferroni correction) in the proportional abun- 
dances of one or more OTUs. For example, mice colonized with 
fecal microbiota from both CSA unrestricted individuals exhibited a 
significant increase in the relative abundance of Bacteroides 
ovatus between the first and last home diet periods, while mice 
colonized with microbiota from the adult Bangladeshi donor ex- 
hibited significantly increased abundances of an OTU classified 
as Clostridiales, and a decrease in Ruminococcus obeum. 

The imperfect recovery of transit time following diverse dietary 
exposures during travel could reflect not only these structural 
differences but also functional differences in the microbiota. 
Therefore, we characterized metabolic features of the micro- 
biota in the context of the most highly contrasted motility 
phenotypes. 

Microbially Deconjugated Bile Acid Metabolites Are 
Correlated with Faster Gut Transit 

To identify metabolic features that correlate with motility pheno- 
types, we applied ultra-high-performance liquid chromatog- 
raphy mass spectrometry (UPLC-MS) to fecal samples collected 
from mice in the three-phase travel experiment on the same days 
when their transit times were measured (three USA unrestricted and 
three Bangladeshi donor microbiota; five mice/donor micro- 
biota; three fecal samples analyzed/mouse). We observed 



>2,500 unique m/z peaks that were present in more than one 
mouse. Spearman’s rank correlations (without Bonferroni 
correction) yielded 599 m/z peaks that were significantly corre- 
lated with transit times, of which 67 (1 1 %) were putatively iden- 
tified as bile acid metabolites (Tables S6A and S6B). In mice, the 
predominant primary bile acids are beta-muricholic acid and 
cholic acid, while in humans they are chenodeoxycholic acid 
and cholic acid (Haslewood, 1 967). Prior to secretion from hepa- 
tocytes into biliary canaliculi, bile acids are conjugated with 
either taurine (predominant in mice; Falany et al., 1 997) or glycine 
(predominant in humans; Falany et al., 1 994) to decrease passive 
absorption by intestinal enterocytes. Bile acids have microbici- 
dal activity; members of the gut microbiota neutralize these 
effects by metabolizing host bile acids, beginning with deconju- 
gation catalyzed by microbial bile salt hydrolases (BSHs) (Drasar 
et al., 1966). Since bile acids are modified by the microbiota, 
we considered whether differences in bile acid metabolite pro- 
files could explain discordant microbiota-associated motility 
phenotypes. 

To gain insights into the relationships between OTUs and bile 
acid metabolites as they pertain to transit times, we calculated 
Spearman’s rank correlation coefficients between the propor- 
tional abundances of 97%ID OTUs in fecal microbiota, the 
peak intensities of fecal bile acids, and transit times in the 
three-phase travel experiment (Figure 4; Table S6B). After Bon- 
ferroni correction, we identified 118 OTUs with significant corre- 
lations between their abundances and levels of one or more fecal 
bile acids; only one of these OTUs, Blautia (OTU ID 296977), also 
had a significant correlation between its abundance and transit 
times (slower transit) (Table S6B). A sparse linear model, built 
after regressing transit times against all bile acid metabolite 
levels then simplified by applying stepwise backward feature 
selection, demonstrated that levels of just five bile acids (7-keto- 
deoxycholic acid, muricholic acid, taurocholic acid, tauro-beta- 
muricholic acid, and tauro-muricholic acid sulfate) accurately 
predicted transit times in the three-phase travel experiment 
(rho = 0.54, p = 1 .8 x 1 0 -7 , Spearman’s rank correlation), outper- 
forming a linear model built with an equivalent number of 
randomly selected non-bile-acid metabolites (rho = 0.17, 
p = 0.47, Spearman’s rank correlation; mean values over 1 ,000 
replications) (Figure SIC; Table S6C). Of these five bile acids, 
none was significantly associated with either diet. Only one bile 
acid species, tauro-muricholic acid sulfate, was significantly 
associated with the geographic origin of the microbiota donor; 
it was found at significantly higher concentrations in fecal spec- 
imens collected from mice colonized with subjects residing in 
Bangladesh (Figure 4). Bile acids that were correlated with faster 
transit times were unconjugated (7-ketodeoxycholic acid and 
muricholic acid), whereas those that correlated with slower 
transit times were conjugated (tauro-beta-muricholic acid, taur- 
ocholic acid, and tauro-muricholic acid sulfate) (Figure 4; Table 
S6B). Levels of multiple bile acids, including these five bile 
acids, were significantly correlated with multiple OTUs (Figure 4), 
underscoring the complexity of microbial bile acid metabolism. 

Turmeric Alters Gut Motility 

Intestinal bile acid concentrations are largely dictated by dietary 
components (e.g., peptides and fats) that trigger intestinal 
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Figure 4. Significant Correlations between 
Fecal Bile Acid Metabolite Concentrations 
and the Relative Abundances of Bacterial 
97%ID OTUs 

Bile acid metabolite profiling and 16S rRNA analysis 
was performed on fecal samples collected from mice 
in the three-phase travel experiment (Figure SI). 
Spearman rank correlations were calculated between 
bile acid concentrations and relative abundances of 
97%ID OTUs. Unsupervised hierarchical clustering 
was applied. Significant associations (p < 0.05 
calculated by two-tailed Student’s t test, with Bon- 
ferroni correction) between microbiota/diet and bile 
acids/OTUs are represented in the vertical and hori- 
zontal side panels. Associations between bile acid 
levels and transit times were calculated by linear 
modeling with stepwise backward feature selection, 
as detailed in the text and Supplemental Experimental 
Procedures. 

See also Figures SI and Tables S4 and S6. 
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signals (e.g., cholecystokinin), which, in turn, influence gall- 
bladder contraction and release of bile into the lumen of the 
proximal small intestine. To further characterize interactions be- 
tween specific OTUs, bile acid metabolism, and diet, we exam- 
ined the effects of turmeric on transit time. We selected turmeric, 
a spice with cultural significance commonly used in Bangladeshi 
cuisine, because it has a dose-dependent cholekinetic effect; 
i.e., its active ingredient, curcumin, stimulates gallbladder 
contraction and thus increases luminal bile acid levels. The effect 
sizes of turmeric’s cholekinesis vary between reported studies, 
possibly due to population-based differences (e.g., European 
versus Asian subjects) or differences in how the spice was 
administered (Marciani et al., 2013; Rasyid and Lelo, 1999; 
Rasyid et al., 2002). In a study using serial hydrogen breath tests 
to assess carbohydrate fermentation and small bowel transit 
time, investigators observed that turmeric-containing Japa- 
nese-style curry fed to Japanese individuals increased fermenta- 
tion and shortened small bowel transit time compared to curry 
prepared without turmeric (Shimouchi et al., 2009). The microbial 
underpinnings of these observations in human subjects are 
unknown, as metagenomic or metabolomic analyses of their 
gut microbiota were not performed. 

We initially defined the effect of turmeric in adult male C57BL/6 
gnotobiotic mice colonized with a collection of anaerobic bacte- 
rial strains cultured from the fecal microbiota of a healthy 2-year- 
old Bangladeshi child (Table S7A). We generated this culture 
collection from a child rather than an adult to avoid the potential 
confounding effects of chronic antecedent turmeric exposure on 
microbiota features. This child, like the three Bangladeshi adults 
whose microbiota were tested in the earlier experiments, lived in 
Mirpur, an urban sub-district of Dhaka. The sequenced, clonally 
arrayed bacterial culture collection gave us the capacity to 
perform follow-up experiments in which specific strains were 
selected for colonization based on their capacity to metabolize 
bile acids. 

Following gavage of the entire culture collection, composed of 
53 bacterial strains (Table S7A), mice were initially fed a Bangla- 
deshi diet lacking turmeric for 10 days (diet phase 1), then the 
same diet containing turmeric for 10 days (diet phase 2), and 
finally the unsupplemented Bangladeshi diet again for 10 days 
(diet phase 3). A control group was maintained under germ- 
free conditions and subjected to the same sequence of diets 
(n = 6 animals/group). To limit carryover of turmeric from the prior 
diet, old bedding was replaced with fresh new bedding at the 
start of each diet phase. 

Colonization was highly reproducible between animals, with 
community assembly completed within 3-5 days after gavage. 
At the end of the first diet phase, 44 ± 1 .2 (mean ± SEM) of the 
53 input strains were detectable in fecal samples collected 
from recipient animals, based on community profiling by shotgun 
sequencing (COPRO-seq) of fecal DNA (see Experimental Pro- 
cedures). Colonized mice had significantly faster transit times 
at each diet phase compared to their germ-free counterparts 
(p = 5.0 x 10 -6 , p < 4.7 x 10- 7 , and p < 1.7 x 10 -5 for diet 
phases 1, 2, and 3, respectively, two-tailed Student’s t test; 
Table S2C). Consumption of turmeric was associated with a sig- 
nificant slowing of motility (i.e., longer transit time) (Table S2C). 
UPLC-MS of fecal samples collected from germ-free mice at 



the end of each diet phase disclosed that ingestion of this chol- 
ekinetic spice was associated with significantly increased levels 
of taurohyodeoxycholic acid (p = 0.003, one-tailed Student’s 
t test) and tauro-muricholic acid sulfate (p = 0.03, one-tailed Stu- 
dent’s t test) compared to the period of unsupplemented diet 
consumption (Table S6A). As expected, no unconjugated bile 
acids were detected in the germ-free group during any of the 
diet phases. We included a curcumin standard in order to quan- 
tify fecal curcumin levels; however, curcumin was undetectable 
in all samples. 

To directly test the hypothesis that microbiota with different 
capacities to deconjugate bile acids transmit distinct transit 
time phenotypes, we first used a UPLC-MS-based in vitro assay 
to screen all members of the clonally arrayed culture collection 
for their BSH (EC 3.5.1.24) activities. The screen demonstrated 
that OTUs representing a number of phylotypes had the ability 
to deconjugate at least one of the two primary bile acids found 
in mice (Table S7A). BLAST predictions, based on the presence 
in a strain’s genome of homologs of known BSH genes (E-value 
threshold cutoff <10 -5 ) were correct in predicting in vitro BSH 
enzymatic activity for 85% of the bacterial strains. Only ten 
strains did not deconjugate either bile acid in vitro (six members 
of the genus Enterococcus, three members of Eggerthella, and 
one belonging to Enterobacteriaceae). The strains with BSH ac- 
tivity were largely members of the genera Bifidobacterium and 
Enterococcus. We then assembled two bacterial consortia, 
each composed of seven strains representing the taxonomic di- 
versity of the BSH-positive and BSH-negative subsets within the 
culture collection: the “BSH hi ” consortium contained four mem- 
bers of Enterococcus and three members of Bifidobacterium, 
and the “BSH| 0 ” consortium had five members of Enterococcus, 
one Eggerthella species, and one Enterobacteriaceae (see Table 
STB for a summary of KEGG-based annotations of the 
sequenced genomes of these 14 strains). Members of the two 
consortia had in vitro growth rates under anaerobic conditions 
in rich medium that were not significantly different from one 
another (p = 0.92, two-tailed Student’s t test). 

Age-matched adult male C57BL/6 gnotobiotic mice were colo- 
nized with either the BSH hi or BSH !o consortium (assembled prior 
to gavage by combining equal numbers of colony-forming units 
of each component strain). A positive control group was colo- 
nized with the entire culture collection. As before, mice in each 
of these three groups (n = 5/group) were given the unsupple- 
mented Bangladeshi diet for 3 days prior to gavage of the 
cultured organisms. Following gavage, mice were maintained 
on the unsupplemented Bangladeshi diet for 8 days, followed 
by the turmeric-supplemented diet for 8 days, and then returned 
to the starting Bangladeshi diet for another 8 days. The complete 
culture collection produced a transit time that was significantly 
faster than that measured with the BSH| 0 consortium in all three 
diet phases (p < 0.002, one-tailed Student’s t test; Table S2D). 
While mice harboring the BSH| 0 consortium had the same transit 
time as the BSH hi consortium in the absence of turmeric, addition 
of this spice to the Bangladeshi diet produced significantly slower 
transit times in BSH| 0 animals (p = 0.02, one-tailed Student’s t test 
comparing phase 1 versus phase 2 transit times; Figure 5A) but 
no significant effects in BSH hi mice (p = 0.7, one-tailed Student’s 
t test). These findings indicate that a gut microbiota capable of 
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Figure 5. An Interaction Between Diet, Bile Acid Metabolism, and 
Gut Motility Revealed by Colonizing Germ-free Mice with a BSH hi or 
BSHio Consortium and Feeding Them a Representative Bangladeshi 
Diet with or without Turmeric 

(A) Turmeric consumption resulted in significantly slower motility in mice 
colonized with the BSH| OJ but not the BSH hi consortium. 

(B) While consuming the turmeric-supplemented Bangladeshi diet, BSH hi mice 
had transit times comparable to mice colonized with the complete 53-strain 
culture collection (“CC”) and faster motility (i.e., shorter transit times) than 
BSHio mice (n = 5 animals/microbiota). In this comparison, data for BSH, 0 and 
BSH hi are the same as in (A). 

(C) Turmeric consumption is associated with a significant increase in total fecal 
unconjugated bile acid concentrations in gnotobiotic wild-type mice colonized 
with the BSH h i consortium, but not the BSH| 0 consortium. 

(D) Transit times measured for two groups of gnotobiotic mice colonized with 
BSHio consortium and fed the unsupplemented or turmeric-containing Ban- 
gladeshi diet. Measurements occurred at the end of the 10-day monotonous 
diet experiment. 

Statistical significance was determined using a one-tailed Student’s t test. 
*p < 0.05. In (A), (B), and (D), transit times are represented by mean values ± 
SEM. In (C), the horizontal line within each box denotes the mean value of the 
measured transit times. The lower and upper boundaries of each box repre- 
sent the 25 th and 75 th percentiles, respectively, while whiskers represent 1 .5 
times the interquartile range. See also Tables S4, S6, S7, and S8. 

deconjugating bile acids could modify the response to turmeric, 
which through its cholekinetic effects delivers increased amounts 
of conjugated bile acids to the proximal intestine. 

Mice colonized with the BSH| 0 consortium had significantly 
slower motility than BSH hi animals only in the setting of turmeric 
(p < 0.03, one-tailed Student’s t test; Figure 5B). UPLC-MS 
confirmed that BSH hi mice had significantly higher total concen- 
trations of fecal unconjugated bile acids (p = 0.01 , one-tailed 
Student’s t test) and significantly lower total concentrations of 



conjugated bile acids (p = 0.02, one-tailed Student’s t test) 
compared to mice colonized with the BSH !o consortium (Fig- 
ure 5C; Table S6A). Total concentrations of the two unconju- 
gated primary bile acids (cholic acid and beta-muricholic acid) 
were significantly negatively correlated with transit times 
(rho = -0.76, p = 3.6 x 1 0“ 9 , Spearman’s rank correlation). 
COPRO-Seq revealed that turmeric had no statistically signifi- 
cant effects on the representation of any strain in either con- 
sortium when compared to the unsupplemented diet phases 
(p > 0.15, two-tailed Student’s t test). 

Applying microbial RNA sequencing (RNA-seq) to fecal sam- 
ples collected at the same time points as those used for the 
COPRO-seq analysis, we confirmed significantly greater overall 
levels of community BSH expression in the BSH hi compared to 
the BSHio consortium’s meta-transcriptome (145-fold; p = 
0.006, two-tailed Student’s t test). However, turmeric did not 
result in significant changes in the levels of BSH transcripts 
in the fecal meta-transcriptomes of BSH| 0 and BSH h i animals 
(p > 0.05, two-tailed Student’s t test). Enterococcus faecalis 
(isolate ID hG2) was the only member of the BSH !o consortium 
that expressed BSH, albeit at low levels (thus explaining the 
presence of fecal unconjugated bile acids in these animals; Fig- 
ure 5C). In the BSH hi consortium, the principal contributors of 
BSH transcripts to the community meta-transcriptome were 
three Bifidobacteria (isolate IDs hBI, hB8, and hF8), which 
together accounted for 76% ± 1 % (mean ± SEM) of these tran- 
scripts. In mice colonized with the BSH !o consortium, turmeric 
resulted in significant reductions in expression of two transcripts 
in the meta-transcriptome assigned to the KEGG “dioxin degra- 
dation pathway” (5.7-fold; p = 0.004, two-tailed Student’s t test): 
salicylate 1 -monooxygenase (EC 1.14.13.1, a oxidoreductase 
that produces catechol) and 4-oxalocrotonate tautomerase 
(EC 5. 3. 2. 6; part of a metabolic pathway that generates tricar- 
boxylic acid cycle intermediates). In mice colonized with the 
BSH hi consortium, no KEGG pathways were significantly differ- 
entially expressed as a function of turmeric consumption. 
Together, these results suggest that turmeric enhances the 
discordance in motility phenotypes between BSH| 0 and BSH hi 
mice not by changing expression of bacterial BSH genes but 
rather through its cholekinetic effect, thereby providing conju- 
gated bile acids to the two consortia with markedly different 
BSH gene content and deconjugation capacities. 

Effects of Turmeric on Host Gene Expression 

To assess the effects of turmeric on host gene expression, we 
focused on the BSH| 0 consortium because, in the previous 
experiment, it transmitted a turmeric-responsive transit time 
phenotype. BSH !o mice were monotonously fed either the un- 
supplemented or turmeric-supplemented Bangladeshi diet for 
10 days. The capacity of turmeric to significantly slow motility 
was replicated in this new, single-diet-phase experiment (p = 
0.003, one-tailed Student’s t test comparing transit times be- 
tween the two diet-treatment groups; Figure 5D). RNA-seq data- 
sets were generated from the liver and terminal ileum, essential 
components of the enterohepatic circulation. Differentially ex- 
pressed genes were identified using the exact negative binomial 
test. Genes that satisfied our criteria for significant differences in 
expression (after correcting for multiple comparisons) are listed 
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in Table S8. Transcriptional data generated from the livers of 
mice from the 10-day-long monotonous-diet experiment indi- 
cated that turmeric consumption was followed by a homeostatic 
response designed to maintain bile acid pool size at constant 
levels: i.e., consistent with the increased fecal bile acid levels 
elicited by turmeric consumption, Cyp7a1 (cholesterol 7a-hy- 
droxylase) expression was 3.3-fold lower (p = 0.00002). 
Cyp7A1 converts cholesterol into 7a-hydroxycholesterol in the 
rate-limiting first step of hepatic bile acid synthesis, a step sub- 
ject to feedback inhibition by increased bile acid concentrations. 

In the terminal ileum, a total of 96 genes exhibited significant 
differential expression in the face of turmeric consumption (Table 
S8A), including greater expression of multiple genes involved in 
gut mucosal immune/barrier function: Retnlb (resistin-like mole- 
cule (3, with 410-fold higher expression), Siglec5 (sialic acid 
binding Ig-like lectin 5, an eosinophil marker), Fut2 (a-1 ,2-fuco- 
syltransferase; a null allele of this gene in humans confers non- 
secretor status and is associated with Crohn’s disease [Tong 
et al., 2014], while Fut2 deficiency in mice enhances susceptibil- 
ity to infection with eukaryotic and bacterial pathogens [Goto 
et al., 2014; Hurd and Domino, 2004]), Nfil3 (nuclear factor, inter- 
leukin 3, a transcription factor that directs development of innate 
lymphoid cells [ILCs]; Geiger et al., 2014; Klose et al., 2014), 
Tnfrsf21 (tumor necrosis factor receptor superfamily member 
21 , involved in T helper cell function), and Irf2 (interferon regula- 
tory factor 2). Retnlb is expressed by intestinal goblet cells and 
enterocytes (Hogan et al., 2006) and appears to have various 
effects on immunity, including maintenance of mucosal barrier 
function (Hogan et al., 2006), macrophage activation to produce 
tumor necrosis factor a (TNF-a) (McVay et al., 2006; Steinbrecher 
et al., 2011), and protection from gut helminthic infections (e.g., 
by inhibiting migration of worms; Artis et al., 2004; Herbert et al., 
2009). Eosinophils, which, as noted above, express Siglec5, 
contribute to protective immunity against parasites (Knott 
et al., 2007). Nfi!3 expression is linked to ILC accumulation; 
group 2 ILCs have been implicated in development of protective 
immunity to parasites (Oliphant et al., 2014), while group 3 ILCs 
promote Fut2 expression and mediate resistance to bacterial 
pathogens such as Salmonella (Goto et al., 2014). Turmeric’s 
ability to modulate the distal ileal transcriptome is interesting in 
light of the South Asian Ayurvedic tradition of using crushed 
turmeric as an anti-helminthic (Handral et al., 2013; Nadkarni 
and Nadkarni, 1976). The burden of parasitic infection in this 
population is great (Roy et al., 2011); our findings offer one 
potential biological insight about why turmeric came to be ubiq- 
uitously represented in Bangladeshi cuisine. Though it is unclear 
whether these findings relate to our observed motility pheno- 
types, they may be of anthropologic significance. 

Interplay of the Microbiota, Bile Acids, and the ENS 

To assess the degree to which the effect of turmeric on motility 
was dependent upon ENS-based signaling (Alemi et al., 2013), 
we turned to mice heterozygous for a null allele of the Ret recep- 
tor (Tsuzuki et al., 1995). Ret, which encodes a transmembrane 
protein that binds glial cell-derived neurotrophic factor family li- 
gands, is the gene most commonly implicated in Hirschsprung’s 
disease (Edery et al., 1 994; Romeo et al., 1 994), a developmental 
disorder associated with absent peristalsis in the distal colon. 



Heuckeroth and colleagues have reported that Ret +/ ~ mice 
exhibit >90% reductions in longitudinal and circular gut muscle 
contractility and 70%-95% reductions in the release of two 
neurotransmitters (substance P and VIP) compared to Ret +/+ 
animals despite having equivalent numbers of enteric neurons 
(Gianino et al., 2003). We found that conventionally raised wild- 
type (Ret +/+ ) mice have significantly slower transit times than 
their conventionally raised heterozygous {Ret +/ ~) littermates 
(p = 0.05, one-tailed Student’s t test; Table S2F). 

We re-derived C57BL/6 Ret +/ ~ mice as germ-free and colo- 
nized the heterozygotes and their wild-type littermates with 
either the seven-member BSH| 0 or seven-member BSH hi con- 
sortium. Animals were subjected to a three-phase diet oscillation 
as in the experiments described above (unsupplemented «► 
turmeric-supplemented unsupplemented Bangladeshi diet; 
10 days/phase). We hypothesized that if enteric neurons were 
key mediators of the observed phenotypes, then the difference 
in transit times seen between wild-type mice colonized with 
the two different seven-member consortia (Figure 5B) might be 
mitigated in Ret +/ ~ animals. Indeed, in contrast to wild-type 
mice, transit times were not significantly different between 
Ret +/ ~ mice harboring the two different consortia (Table S2D) 
despite the same pattern of differences in fecal bile acids con- 
centrations documented by UPLC-MS in wild-type mice: 
Ret +/ ~ mice colonized with the BSH| 0 consortium had signifi- 
cantly lower fecal concentrations of unconjugated bile acids 
compared to Ret +/ ~ mice colonized with the BSH hi consortium 
in the setting of turmeric consumption (p = 0.009, one-tailed Stu- 
dent’s t test). As in wild-type mice, COPRO-seq analysis showed 
that turmeric consumption had no significant effects on the 
abundances of any of the bacterial strains in the Ret +/ ~ mice. 
Notably, a significant difference in transit times between Ret +/ ~ 
and wild-type mice was only seen when animals were colonized 
with the BSH hi bacterial consortium (diet phase 1: p = 0.009, 
phase 2: p = 0.02, phase 3: p = 0.09, one-tailed Student’s 
t test; p > 0.05 in all analogous comparisons of Ret +/ ~ versus 
wild-type mice colonized with the BSH| 0 consortium). Thus, while 
turmeric consumption has a cholekinetic effect in both wild-type 
and Ret +/ ~ mice, the transit time phenotype it produces in wild- 
type mice is mediated by gut microbial bile acid metabolism and 
a functionally intact ENS. 

DISCUSSION 

Intestinal motility is a key physiologic parameter impacting nutri- 
tional status and gut health. The travel-associated diet changes 
that we model here are increasingly relevant to our daily lives 
during this period of rapid globalization, in which a day spent 
entirely in one’s hometown may nonetheless consist of con- 
sumption of foods representing several of the world’s cultures. 
A gnotobiotic mouse model of global travel could also incorpo- 
rate additional factors that impact the gut microbiota, such as 
disruption of circadian rhythm (Thaiss et al., 2014) or the order 
in which different diets are experienced. Using gnotobiotic 
mice colonized with microbiota obtained from healthy individuals 
representing different geographic and cultural traditions and 
diets representative of those consumed by these donors, we 
were able to dissect factors that interact to define a motility 
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phenotype. These diet-microbiota-metabolic interactions were 
resolved in the context of Bangladeshi microbiota through a 
multi-pronged strategy that involved (1) manipulating the dietary 
representation of a single culturally relevant spice, turmeric; 
(2) selecting members of a clonally arrayed culture collection 
generated from a Bangladeshi donor’s microbiota, based on 
whether or not they were able to support BSH-mediated bile 
acid deconjugation; and (3) colonizing germ-free mice with or 
without a mutation in Ret, a key regulator of ENS function, with 
a BSH hi or BSH| 0 consortium. 

The simulation of global-travel-associated short-term diet 
shifts revealed that (1) diet-discriminatory bacterial strains 
were represented across microbiota from individuals raised in 
environments that are geographically and culturally distinct 
and (2) correlations between individual bacterial species abun- 
dances and transit times are largely diet dependent, e.g., a given 
bacterial strain can have contrasting correlations with transit 
times depending upon the diet context. These findings suggest 
that future use of bacterial strains derived from the human gut 
as probiotic agents for motility disorders will require thoughtful 
consideration of an individual’s dietary practices and/or adjunct 
dietary recommendations. Reciprocally, our preclinical data 
suggest that dietary treatments for motility disorders need to 
be calibrated based on structural and functional features of an 
individual’s microbiota. In this respect, we find that unconju- 
gated bile acids resulting from bacterial metabolism are consis- 
tently correlated with faster transit times across different diets 
and microbiota. In patients with irritable bowel syndrome, a 
limited number of clinical trials have suggested that a subset of 
patients respond to oral administration of the unconjugated 
bile acid chenodeoxycholate with accelerated transit times 
(Odunsi-Shiyanbade et al., 2010; Rao et al., 2010), consistent 
with our observations. Our results point to fecal BSH activity 
as a functional microbiota parameter that could be useful for cat- 
egorizing individuals with motility disorders, allowing correlation 
analyses to be performed between the levels of its conjugated 
substrates and/or its various unconjugated bile acid products 
and transit time. If significantly correlated within a population, 
this metabolic activity could be used as a target or biomarker 
in clinical studies testing the effects, both short- and long- 
term, of various therapeutic interventions. 

The imperfect recovery of transit times following consumption 
of travel diets, initially noted in the context of the six-phase and 
three-phase travel experiments, was also seen in our experi- 
ments involving sequential presentation of an unsupplemented, 
turmeric-supplemented, and unsupplemented Bangladeshi diet. 
Across ecosystems, history (i.e., order and temporal features of 
perturbations) is well known to impact community structure and 
function (Chase, 2003; Fukami and Morin, 2003; Pagaling et al., 
2014). In this context, turmeric may influence alternative stable 
states (Staver et al., 2011) of the gut microbiota and ENS. 
Despite our practice of changing bedding between diet phases 
to limit carryover of ingredients, turmeric may have long-lived 
effects on host physiology. A sustained increase in total bile 
acid pool size evoked by turmeric’s cholekinetic effects seems 
an unlikely explanation for these observations, as bile acid con- 
centrations in the final diet phase returned to pre-turmeric levels 
(Figure 5C). Follow-up experiments examining the duration of 



turmeric’s effects on the transcriptome of purified enteric neu- 
rons (including the TGR5 bile acid receptor; Alemi et al., 2013) 
in combination with analyses of the microbiota and its metabolic 
features, localization of motility effects (i.e., gastric, small intes- 
tinal, and/or colonic), and production of neuroactive compounds 
such as serotonin by enteroendocrine cells (Yano et al., 2015) 
could reveal and help characterize this postulated turmeric- 
induced alternative stable state. 

Other dietary ingredients (e.g., polysaccharides; Kashyap 
et al., 201 3) and products of bacterial metabolism (e.g., butyrate; 
Soret et al., 2010) have been previously described to impact 
motility in mouse models. Populations experiencing shifting cul- 
tural/culinary traditions through travel, immigration, or emigration 
are susceptible to marked changes in their gut microbiota, both 
structural and functional, which may have downstream health 
consequences. In principle, our approach could be used to iden- 
tify and characterize the biological activities and microbiota inter- 
actions of dietary components characteristic of dietary/cultural 
traditions established over centuries but now vulnerable to dimin- 
ished use due to Westernization. Their exclusion from modern di- 
ets may represent a loss of key food ingredients that could be 
used to promote health in contemporary societies. These ingredi- 
ents may also serve as valuable tools for identifying and charac- 
terizing mechanisms by which food and the microbiota interact to 
affect various features of our physiology. 

EXPERIMENTAL PROCEDURES 

Measurement of Gastrointestinal Transit Times Using Non- 
absorbable Red Carmine Dye 

Carmine red (Sigma-Aldrich) was prepared as a 6% (w/v) solution in 0.5% 
methylcellulose (Sigma-Aldrich) and autoclaved prior to import into gnotobi- 
otic isolators. Mice were maintained on a strict 12-hr light cycle (lights on 
between 06:00 and 1 8:00) and gavaged with 1 50 [xl of the carmine solution be- 
tween 08:00 and 08:30 local time. Animals were not fasted beforehand. Feces 
were collected every 30 min (up to 8 hr from time of gavage) and streaked 
across a sterile white napkin to assay for the presence of the red carmine 
dye. The time from gavage to initial appearance of carmine in the feces was 
recorded as the total intestinal transit time for that animal. 

Generating a Clonally Arrayed Culture Collection of Anaerobic 
Bacterial Strains from the Fecal Microbiota of a Healthy 24-Month- 
Old Bangladeshi Child 

A clonally arrayed culture collection was generated using methods described in 
an earlier publication (Goodman et al., 201 1). A given well of the 96-well plate 
used to archive the collection contained a monoculture of a single isolate. 
Each isolate’s genome was sequenced using an lllumina MiSeq or HiSeq instru- 
ment (250-nt and 1 01 -nt paired end reads, respectively; 53- ± 4.8-fold (mean ± 
SEM) genome coverage). Genomes were assembled using MIRA (Chevreux 
et al., 1999) (N50 contig length: 23,253 ± 2,669 bp; range, 735-112,622 bp). 
Assemblies were annotated using Prokka (version 1.10) (Seemann, 201 4). Pre- 
dicted genes were mapped to KEGG pathways by querying the KEGG refer- 
ence database (release 72.1) and assigning their protein products to KEGG 
Ortholog (KO) groups (BLAST 2.2.29+, blastp E-value threshold < 1 0 1 °, single 
best hit defined by E-value and bit score) (Kanehisa and Goto, 2000; Kanehisa 
et al., 2014). Species-level taxonomic identities of bacterial isolates were 
defined by Sanger capillary sequencing of full-length 1 6S rRNA gene amplicons 
generated using the universal 8F and 1391 R PCR primers, with classifications 
performed using the Ribosomal Database Project (RDP) version 2.4 classifier 
(Ridaura et al., 201 3). Strain-level taxonomic classifications were subsequently 
determined based on a minimum 96% overall genome sequence identity 
(calculated by the software package NUCmer, Kurtz et al., 2004) between iso- 
lates bearing the same 1 6S rRNA-based taxonomy. A total of 53 unique strains 
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were identified using this 96% identity cutoff threshold and grown in modified 
gut microbiota medium (mGMM) containing ingredients described in a previous 
publication (Goodman et al., 2011) but without short-chain fatty acid supple- 
mentation. Strains were stored at -80°C in 15% glycerol (v/v) in reduced 
PBS until used for gavage of germ-free mice. 

In Vitro Assays for Bile Acid Deconjugation 

We screened each of the 91 isolates, comprising 53 strains, in the clonally ar- 
rayed culture collection for their capacity to deconjugate bile acids. The two 
predominant primary bile acids in mice, taurocholic acid (TCA; Sigma-Aldrich) 
and tauro-beta-muricholic acid (TbMCA; Santa Cruz Biotechnology), were dis- 
solved in water at concentrations of 100 mg/ml and 10 mg/ml, respectively. 
Each isolate was first incubated in 1 ml mGMM containing 100 ^iM TCA for 
48 hr in an anaerobic Coy chamber (75% N 2 , 20% C0 2 , and 5% H 2 ) with 
growth monitored based on optical densities at 600 nm (OD 60 o)- Cells were 
then pelleted by centrifugation (17,900 x g for 7 min at 4°C), and the resulting 
supernatant was subjected to UPLC-MS (see Supplemental Experimental Pro- 
cedures) to assess levels (peak intensities) of unconjugated and conjugated 
bile acids (cholic acid and TCA, respectively). For the vast majority of isolates, 
bile acid profiles at 48 hr were either all conjugated or unconjugated. Growth of 
bacterial isolates was simultaneously assessed by measuring OD 600 to ensure 
that a lack of deconjugation was not simply a reflection of a lack of bacterial 
viability. If no deconjugation was observed, we performed a secondary screen 
where isolates were incubated in 1 ml mGMM containing 100 [xM TbMCA for 
48 hr in an anaerobic chamber, followed by UPLC-MS quantitation of the levels 
of unconjugated (beta-muricholic acid) and conjugated (TbMCA) bile acids. 
Bacterial isolates that did not deconjugate either bile acid in vitro were consid- 
ered as eligible for the BSH| 0 consortium. See Supplemental Experimental Pro- 
cedures for additional protocols. 
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In Brief 

A protein associated with the mitotic 
spindle must undergo a phase transition 
to promote microtubule polymerization 
and spindle assembly, suggesting that 
the biophysical properties associated 
with liquid demixing may shape the 
characteristics of a hypothesized but 
elusive spindle matrix. 
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SUMMARY 

Spindle assembly required during mitosis depends 
on microtubule polymerization. We demonstrate 
that the evolutionarily conserved low-complexity 
protein, BuGZ, undergoes phase transition or coac- 
ervation to promote assembly of both spindles and 
their associated components. BuGZ forms tempera- 
ture-dependent liquid droplets alone or on microtu- 
bules in physiological buffers. Coacervation in vitro 
or in spindle and spindle matrix depends on hydro- 
phobic residues in BuGZ. BuGZ coacervation and 
its binding to microtubules and tubulin are required 
to promote assembly of spindle and spindle matrix 
in Xenopus egg extract and in mammalian cells. 
Since several previously identified spindle-associ- 
ated components also contain low-complexity re- 
gions, we propose that coacervating proteins may 
be a hallmark of proteins that comprise a spindle 
matrix that functions to promote assembly of spin- 
dles by concentrating its building blocks. 

INTRODUCTION 

Since the discovery of spindle apparatus in the 1800s (Lukacs, 
1981), much attention has focused on how microtubules (MT) 
interact with chromosomes to ensure equal partitioning of chro- 
mosomes into daughter cells. Investigation of the mechanisms 
by which MTs and MT-associated proteins regulate mitosis 
(Walczak et al., 201 0) is fueled by the ease of visualizing the spin- 
dle-shaped MT fibers, the disruption of chromosome segregation 
and cell division upon MT perturbation, and the discovery of 
tubulin (Oakley, 2000). In addition to spindle MTs, a set of material 
that surrounds and permeates spindle MTs have periodically 
drawn attention (Goldman and Rebhun, 1969; Schibler and Pick- 
ett-Heaps, 1980; Scholey et al., 2001; Johansen and Johansen, 
2007; Johansen et al., 2011; Leslie et al., 1987; Pickett-Heaps 
et al., 1984; Pickett-Heaps and Forer, 2009; Schweizer et al., 
2014; Wein et al., 1998; Zheng, 2010; Zheng and Tsai, 2006). 

Historically, this ill-defined spindle-associated material has 
been referred to as spindle matrix. One vague but generally 
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accepted feature of spindle matrix is that it retains some integrity 
upon MT disassembly. Based on this criterion, several spindle 
matrix proteins have been identified and studied in the context 
of spindle assembly and chromosome segregation. For 
example, among the Drosophilia spindle matrix proteins (Fabian 
et al., 2007; Johansen et al., 2011; Qi et al., 2004, 2005; Rath 
et al., 2004; Walker et al., 2000; Yao et al., 2012, 2014), Megator 
regulates spindle assembly checkpoints (SAC) (Lince-Faria 
et al., 2009). A conserved protein, BuGZ, which was identified 
as part of the lamin-B (LB) spindle matrix in Xenopus (Tsai 
et al., 2006; Ma et al., 2009), has recently been shown to facilitate 
chromosome alignment by controlling both stability and kineto- 
chore loading of the SAC component Bub3 (Jiang et al., 2014; 
Toledo et al., 2014). Additionally, LB (Tsai et al., 2006) and poly 
ADP-ribose (Chang et al., 2004), along with other spindle assem- 
bly factors (SAFs), such as dynein, Nudel, NuMA, and kinesin 
Eg5 (Civelekoglu-Scholey et al., 2010; Goodman et al., 2010; 
Ma et al., 2009; Tsai et al., 2006), may regulate spindle morpho- 
genesis. Despite these studies, the structural nature of the 
spindle matrix remains undefined and whether it constitutes a 
cohesive functional unit is unclear. In fact, some modeling and 
biophysical probing of spindle apparatus have not provided 
evidence for the existence of spindle matrix (Brugues and 
Needleman, 2014; Gatlin et al., 2010; Shimamoto et al., 2011). 
Thus, whether spindle matrix is a real structural element of spin- 
dle apparatus or a mere artifact induced upon depolymerization 
of spindle MTs remains an open question. 

Unlike membranous organelles, the spindle apparatus is not 
surrounded by membrane barrier during vertebrate mitosis. 
However, spindles may need to concentrate many components 
in order to support spatially and temporally diverse reactions. 
Consistently, tubulin and some SAFs are shown to be concen- 
trated in the region where nascent spindle begins to assemble 
in Caenorhabditis elegans embryos (Hayashi et al., 2012). This 
concentration process is independent of MTs but it requires 
nuclear envelope permeabilization and RanGTPase, which 
stimulates spindle assembly (Kalab et al., 1999; Ohba et al., 
1999; Wilde and Zheng, 1999). 

Proteins, such as elastin and elastin-like peptides, can 
undergo liquid-liquid phase transition or coacervation to form 
liquid droplets (Yeo et al., 2011). The phase separation has 
been proposed to promote concentration of molecules into the 
liquid droplets, which can then facilitate biochemical reactions 
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Figure 1. BuGZ Promotes Spindle Assembly Independent of Kinetochores 

(A) Western blotting of xBuGZ depletion (left) and add-back (right) in extracts. xBuGZ depletion (dep) efficiency and xBuGZ addition are shown in titrations. 
(B-E) Representative images (B) show that xBuGZ depletion reduced astral MT length, bipolar spindle formation and length, which were all rescued by His- 
xBuGZ. Approximately 50 (C and D) or 500 (E) structures were measured in each experiment and condition. White dashed lines in (B) indicate Aurora A spindle 
length and longest astral MTs measured. 

(F-H) xBuGZ depletion caused multiple sperm spindle defects (F), which was rescued by His-xBuGZ (G and H). Approximately 500 (G) and 50 (H) structures were 
analyzed in each experiment and condition. 

(legend continued on next page) 
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(Hyman et al., 201 4). Indeed, the liquid droplet feature of P gran- 
ules and nucleoli is consistent with the idea that assembly and 
function of these non-membranous organelles could be driven 
by the phase transition of some of their structural components 
(Brangwynne et al., 2009, 2011). No proteins of these organelles, 
however, have yet been shown to undergo functionally relevant 
phase transition. Interestingly, when engineered as multiple 
tandem repeats, SRC homology 3 (SH3) domains of NCK and 
proline-rich motif (PRM) of N-WASP form multivalent interac- 
tions, which allow the protein mixture to undergo phase transi- 
tion to form liquid droplets. These droplets concentrate actin to 
promote F-actin assembly in vitro (Li et al., 2012). Despite the 
observed in vitro phase transition into liquid droplets, proteins 
have been shown to function in vivo via phase transition. 

Here, we examine the spindle regulatory protein BuGZ, which 
we noted contains evolutionarily conserved low complexity 
sequence, and demonstrate that it forms a MT-independent 
structure through temperature- and hydrophobic residue- 
dependent coacervation. This phase transition property allows 
the concentration of tubulin along MTs and supports assembly 
of spindle MTs and of the biochemically defined spindle matrix 
structure. Based on these results, we propose a model and 
line of investigation for further developing our understanding of 
observed properties and possible functions of spindle matrix. 

RESULTS 

BuGZ Promotes Assembly of Spindle Apparatus 

Our previous studies show that BuGZ binds MTs to promote 
kinetochore loading of Bub3 and chromosome alignment (Jiang 
et al., 2014). We noticed that human BuGZ (hBuGZ) depletion in 
HeLa cells resulted in a more severe disruption of spindle 
morphology and reduction of MT intensity than those depleted 
of Bub3, especially when RNAi treatment was extended to 
72 hr (Figures SI A and SIB). The more severe spindle defects 
in hBuGZ-depleted cells were consistent with a stronger chro- 
mosome misalignment than those depleted of hBub3 (Fig- 
ure SIC). This suggests that BuGZ could directly regulate 
spindle assembly independent of Bub3's kinetochore function. 

Previously, we developed a bead-based spindle assembly 
assay (Tsai and Zheng, 2005) by tethering the mitotic kinase 
Aurora A to 2.8-jim magnetic beads via antibodies. These beads 
function as MT organizing centers to induce efficient spindle as- 
sembly in the cytostatic factor (CSF) arrested Xenopus egg 
extract (referred to as extract below) in the presence of RanGTP. 
Since spindles induced by Aurora A beads and RanGTP do not 
have chromosomes and kinetochores, we can test the kineto- 
chore-independent function of BuGZ in spindle assembly. Immu- 
nodepletion of Xenopus BuGZ (xBuGZ) by ~90% (Figure 1A) 
resulted in a significant reduction of astral MT length and bipolar 



spindle numbers (Figures 1 B, 1 C, and 1 E). Most bipolar spindles 
formed in the absence of xBuGZ were also significantly shorter 
than those of controls (Figure ID). These defects were fully 
rescued by purified xBuGZ (Figures 1A-1E). xBuGZ depletion 
also disrupted spindle assembly induced by sperm chromatin. 
Major phenotypes included spindles with MT aggregates sur- 
rounding sperm chromatin or spindles with reduced MTs, fol- 
lowed by asters, half spindles, or abnormal spindle shapes with 
normal MT density (Figures 1F-1H), and all of the defects were 
also rescued by purified xBuGZ (Figures 1 F-1 H). Thus BuGZ pro- 
motes spindle assembly independent of its kinetochore function. 

BuGZ-MT Interaction Promotes Spindle Assembly 

To understand how BuGZ promotes spindle assembly, we treated 
HeLa cells with control or hBuGZ siRNA and then depolymerized 
MTs in the cold. MT regrowth was examined after returning cells to 
37°C. hBuGZ depletion greatly reduced astral MT regrowth, which 
was rescued by expressing the RNAi-insensitive wild-type mouse 
BuGZ (mBuGZ, Figures 1 1 and 1 J). The N-terminal 92 amino acids 
of BuGZ bind directly to MTs, while the Gle2-binding sequence 
(GLEBS) within the C terminus of BuGZ directly binds and stabi- 
lizes Bub3 (Jiang et al., 201 4). Replacing the two highly conserved 
glutamic acids (E) in GLEBS with alanine (A) results in a mutant 
(mBuGZAA) that fails to bind and stabilize Bub3, while mBuGZAN 
lacking the N-terminal 92 amino acids does not bind to spindles 
in vivo and MTs in vitro (Jiang et al., 2014). The wild-type mBuGZ 
and mBuGZAA bound to spindle MTs and MTs assembled from 
pure tubulin (Figures S1D-S1F). To analyze which of these two 
known domains in BuGZ promotes assembly of spindle, we 
depleted endogenous hBuGZ from HeLa or U20S cells by 
RNAi. BuGZ RNAi-induced spindle defects, judged by spindle 
MT intensity, were rescued fully by wild-type mBuGZ and partially 
by mBuGZAA, but not by mBuGZAN (Figures 1J-1L). hBuGZ 
depletion did not alter interphase MT densities (Figure S1G). 
Thus MT binding of BuGZ promotes spindle MT assembly. 

Spindle Matrix Assembly and Stability Require BuGZ and 
a Physiological Temperature 

Since BuGZ was identified as a spindle matrix component, we 
assayed for spindle matrix by assembling Aurora A-bead spin- 
dles in extract and then depolymerizing MTs using nocodazole 
at room temperature (RT) (Ma et al., 2009; Tsai et al., 2006). 

The nocodazole-insensitive material that remains on the Aurora 
A beads, i.e., the spindle matrix, was isolated using a magnet 
and analyzed by western blotting or immunostaining probing 
known spindle matrix markers, lamin-B3 (LB3, the major lamin 
in extracts), dynein, Eg5, NuMA, and XMAP215 (Ma et al., 
2009; Tsai et al., 2006). Depleting xBuGZ greatly diminished re- 
covery of the spindle matrix, but this was rescued by purified 
xBuGZ (see the Noc, RT panels in Figures 2A-2C). Although 



(I) hBuGZ depletion reduced astral MT re-growth in mitotic HeLa cells and was rescued by mBuGZ. Cold-treated cells were examined at 3 or 9 min after returning 
to 37°C. MTs, centromeres, and chromosomes were stained by tubulin antibody, CREST serum, and DAPI, respectively. 

(J) Western blotting analyses of HeLa and U20S cells treated by hBuGZ siRNA and transfected with indicated plasmids. 

(K and L) HeLa or U20S images (K) show that hBuGZ depletion by 72 hr of RNAi diminished MT intensity in spindles and was rescued fully by mBuGZ, partially by 
mBuGZAA, but not by mBuGZAN. Cells were blocked with 10 [xM MG132 for 1 hr before immunostaining. Approximately 30 cells were measured for each 
experiment and condition (L). 

Error bars, SEM. Student’s t test: *p < 0.05, **p < 0.01 , ***p < 0.001 , three independent experiments. Scale bars, 5 [xm. See also Figure SI . 
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Figure 2. Effects of BuGZ and Temperature 
on Spindle Matrix 

(A-C) Both BuGZ and temperature influenced 
spindle matrix assembly. Spindle matrices were 
prepared with or without xBuGZ after nocodazole 
(Noc) treatment at RT or on ice and assayed by 
western blotting (A) or immunostaining (B) using 
indicated markers. Tubulin, negative control. LB3 
intensity of ^30 spindle matrices associated with 
one or two beads was quantified (C). Less spindle 
matrix was present in the cold than at RT. Green 
Aurora A beads appear larger than 2.8 pm due to 
secondary anti-rabbit antibody staining. 

(D) Taxol-treated HeLa cells were incubated at RT 
or on ice. Metaphase cells were visualized by 
immunostaining with tubulin (green), hBuGZ (red) 
antibodies, and DAPI (blue). Approximately 100 
cells were quantified for each condition. 

(E) Treatment of Taxol-stabilized Aurora A spindles 
on ice for 5 min diminished xBuGZ signal on 
spindles visualized by fluorescein-labeled MTs 
(red) and xBuGZ immunostaining (green). 
Approximately 50 spindles were quantified for 
each condition. 

(F) Extraction of Taxol-stabilized and cold-treated 
metaphase spindles in HeLa cells diminished 
hBuGZ signal on spindles compared to RT 
extraction. Metaphase cells shown were immu- 
nostained using tubulin and hBuGZ antibodies and 
DAPI. Approximately 100 cells were quantified for 
each condition. 

Error bars, SEM. Student’s t test: p* < 0.05, **p < 
0.01 , ***p < 0.001 , three independent experiments. 
Scale bar, 5 pm. See also Figure S2. 



When spindle MTs were depolymerized 
by nocodazole on ice, less matrices were 
associated with Aurora A beads than 
those incubated at RT (compare IgG ma- 
trix panels in Figures 2A and 2B). Quanti- 
fication of LB3 staining revealed a signifi- 
cant reduction of matrices around beads 
upon cold treatment or upon xBuGZ 
depletion (Figure 2C). xBuGZ depletion 
plus cold treatment caused an additional 
matrix reduction that could be rescued 
by purified xBuGZ (Figure 2C). Thus spin- 
dle matrix assembly and stability require 
BuGZ and a physiological temperature. 



xBuGZ depletion diminished the recovery of LB3, depleting LB3 
did not affect association of xBuGZ with the spindle matrix (Fig- 
ure S2A). Thus, BuGZ may function upstream of LB3 to promote 
spindle matrix assembly. 



BuGZ Exhibits Temperature- 
Sensitive Binding to Spindle MTs 

Unlike many MT-associated SAFs that 
on ice decorate MT fibers densely and brightly, 

BuGZ appears as a loose “haze” en- 
riched on spindles (Figure S1E) (Jiang 
et al., 2014). When HeLa cells were incubated at RT or on ice 
for 5 min followed by immunostaining, we found that cold treat- 
ment diminished BuGZ signal on spindles, whether or not the 
spindle MTs were stabilized with Taxol (Figures 2D and S2B). 
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xBuGZ signal was also reduced on cold-treated Aurora A-bead 
spindles stabilized by Taxol (Figure 2E). 

We then treated HeLa cells by Taxol and collected mitotic cells 
by shake-off. Detergent extraction of these cells on ice or at RT in 
the presence of Taxol followed by immunoblotting showed that 
more hBuGZ was extracted in the cold as compared to tubulin 
or CENP-A controls (Figure S2C). Immunostaining further indi- 
cated that spindle-associated hBuGZ was more sensitive to 
the extraction on ice than at RT (Figure 2F). 

BuGZ Exhibits Temperature-Dependent Phase 
Transition via Conserved Phenylalanine and Tyrosine 

We analyzed vertebrate BuGZ protein sequences using PONDR 
and SEG, programs designed to predict the disordered (Xue 
et al., 2010) and low complexity regions (Wootton, 1994) in pro- 
teins, respectively. The N terminus of BuGZ, containing the MT 
binding domain and zinc fingers, was predicted to have normal 
amino acid complexity, while the rest of BuGZ was largely un- 
structured with low amino acid complexity (Figure 3A). Since 
some disordered and low complexity (DLC) proteins can 
undergo phase transition, we examined Sf9 cells expressing 
YFP-tagged xBuGZ (YFP-xBuGZ) via baculovirus. YFP-xBuGZ 
formed bright droplet-like spheres in the cytosol, whereas YFP 
was evenly distributed (Figure 3B). 

We purified His-, GST-, GFP-, or YFP-tagged xBuGZ or 
mBuGZ expressed in either Sf9 cells or bacteria (Figures 3C 
and S3 A). Upon warming, each protein formed droplets of vary- 
ing sizes in physiologically relevant buffers and the droplet size 
increased over time (Figures 3D, S3B, and S3C). When the solu- 
tions were cooled on ice, the droplets disintegrated over time, as 
judged by the disappearance of fluorescence in the droplets 
(Figure S3D). Under the same conditions and the same or 
much higher concentrations, purified GST, GFP, YFP, or other 
SAFs such as GFP-EB1 did not form droplets (Figure S3E). 
Live imaging showed that at 100 jiM of YFP-xBuGZ small drop- 
lets became visible at ~10°C and larger droplets formed upon 
further increase in temperature (Figure 3E). 

Turbidity assay showed that purified YFP-xBuGZ underwent 
an abrupt increase in solution turbidity above a critical tempera- 
ture, and the process was reversible upon cooling to the same 
temperature (Figure 3F). After dissolution of droplets on ice, 
BuGZ underwent the same degree of phase transition upon 
warming, indicating that the phase separation was repeatable 
(Figure 3G). Formation of coacervates at high concentrations 
of YFP-xBuGZ eventually led to large-scale phase separation 
that become visible to the naked eye due to protein settlement 
at the bottom of cuvettes (Figure 3H). By varying protein concen- 
trations and temperature, we found that a lower concentration of 
YFP-xBuGZ required a higher temperature for coacervation (Fig- 
ure 31). YFP-xBuGZAN exhibited similar coacervation properties 
as YFP-xBuGZ, especially at higher protein concentrations (Fig- 
ures 31, 3J, and 3N), indicating that the MT-binding sequence is 
dispensable for phase transition in vitro. By contrast, YFP did not 
undergo any noticeable phase transition at equivalent concen- 
trations and temperatures (Figure 3K). 

The above findings suggest that BuGZ could undergo intermo- 
lecular interactions mediated by hydrophobic residues. We 
found that BuGZ orthologs are all abundant in proline (P). For 



example, mBuGZ and xBuGZ contain 1 5%— 1 9% P, which often 
occur next to hydrophobic residues (Figure S3F). The hydropho- 
bic and aromatic residues phenylalanine (F) and tyrosine (Y) are 
implicated in hydrogel formation of some nucleoporins (Frey and 
Gorlich, 2007), transcription factors, and RNA binding proteins 
(Kato et al., 2012; Kwon et al., 2013). Since both F and Y are 
highly conserved among vertebrate BuGZ (Figure S3F), we 
mutated the last 5 or all 13 conserved Fs and Ys in the predicted 
DLC region of xBuGZ to serine (S) to create YFP-xBuGZ5S or 
YFP-xBuGZ13S (Figure S3F). Coacervation of YFP-xBuGZ5S 
and YFP-xBuGZ13S required increasingly higher concentration 
and temperature (Figures 3L-3N). Thus conserved Fs and Ys 
are required for BuGZ phase transition. 

We found that a fragment of xBuGZ corresponding to amino 
acids 258-334 (xBuGZ-B) (Figure S3F, black underline), which 
did not form droplets on its own (Figure S3E), inhibited the phase 
transition of YFP-xBuGZ when used at high concentrations (Fig- 
ure S3G). Replacing the 2F and 1Y by S in xBuGZ-B (xBuGZ- 
B3S, also did not form droplets on its own, Figure S3E) largely 
disrupted the inhibitory property (Figures S3G-S3I). At concen- 
trations that did not fully block His-xBuGZ droplet formation, 
GFP-xBuGZ-B strongly incorporated into the droplets as 
compared to GFP-xBuGZ-B3S (Figure S3J). Thus, intermolec- 
ular BuGZ interactions in droplets may be mediated, in part, by 
Fs and Ys. Moreover, xBuGZ-B may disrupt droplet formation 
by blocking proper alignment of full-length xBuGZ molecules 
critical for coacervation. 

BuGZ Bundles MTs via MT Binding and Phase Transition 

We incubated 0-4 piM YFP-xBuGZ, YFP-xBuGZAN, YFP- 
xBuGZ13S, or YFP with rhodamine-labeled and Taxol-stabilized 
short MTs at 37°C. BuGZ (4 fiM), but not xBuGZAN, xBuGZ13S, 
or YFP, caused prominent MT bundling, although careful inspec- 
tion showed that some MT bundles were formed even at 1 (iM 
xBuGZ (Figure 4A). The bundled MTs were longer and brighter 
than the input MT fragments. To quantify the bundling activity, 
we measured the length and average brightness of individual 
MTs or MT bundles formed with 2 (iM of different xBuGZ proteins 
and YFP because MT bundles at higher xBuGZ concentrations 
became a network. xBuGZ increased MT bundle length and 
average intensity, but xBuGZAN, xBuGZ13S, or YFP did not 
(Figures 4B and 4C). xBuGZ-B inhibited MT bundling induced 
by xBuGZ, while xBuGZ-B3S was much less effective (Figures 
4B and 4C). Thus, MT binding and phase transition of BuGZ pro- 
mote MT bundling. 

Although 2 or 4 jiM YFP-xBuGZ, YFP-xBuGZAN, YFP- 
xBuGZ13S, and YFP did not form visible droplets at 37°C 
(Figure S4A), when incubated with MT seeds, YFP-xBuGZ was 
enriched on MT bundles and small YFP-xBuGZ droplets were 
visible along some bundles (arrowheads, Figures 4D and S4B). 
Line scan showed that the droplets tended to appear at sites 
of thicker MT bundles flanked by thinner regions (Figure 4D). 
YFP-xBuGZAN and YFP-xBuGZ13S failed to form MT-associ- 
ated droplets, even though YFP-xBuGZ13S showed MT associ- 
ation (Figure S4B). MT pelleting also showed that, despite intact 
MT binding domain and similar MT binding at 4°C, significantly 
less YFP-xBuGZI 3S bound to MTs at 37°C than did YFP-xBuGZ 
(Figures 4E and 4F). When high concentrations (60-100 fiM) of 
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YFP-xBuGZ were used to induce MT bundling, large deformed 
BuGZ droplets extended beyond the associated MT bundles 
(Figure S4C), suggesting that the MT bundles could flatten the 
droplets when local BuGZ was not in excess. Thus BuGZ prefer- 
ably undergoes phase transition along MTs, while the multivalent 
MT binding sites in BuGZ coacervates in turn promote MT 
bundling. 

BuGZ Phase Transition Concentrates Tubulin and 
Promotes MT Polymerization 

When different xBuGZ were incubated with rhodamine-tubulin in 
the presence of nocodazole to prevent MT assembly, tubulin 
was concentrated in YFP-xBuGZ, but not in YFP-xBuGZAN, 
droplets (Figure 5A). When incubated with beads coated with 
different forms of BuGZ at 4°C in the presence of nocodazole, 
purified tubulin was pulled down by YFP-xBuGZ or YFP- 
xBuGZ13S, but not by YFP-xBuGZAN (Figure 5B). 

Next, purified BuGZ and tubulin were combined on ice in the 
presence of nocodazole and incubated at 37°C followed by 
pelleting the droplets (Figure 5C). Both YFP-xBuGZ and YFP- 
xBuGZAN were enriched in droplets by many fold than the initial 
solution concentrations (Figures 5D and 5E). Tubulin was greatly 
enriched in the YFP-xBuGZ, but not the YFP-xBuGZAN, droplets 
(Figures 5D and 5E). Neither YFP-xBuGZ13S nor tubulin was 
enriched in the pellet fractions (Figure 5F). 

Since MT assembly is greatly aided by high tubulin concentra- 
tions (Caudron et al., 2000), the concentration of tubulin by BuGZ 
droplets could lead to enhanced MT assembly. Indeed, in vitro 
MT assembly assays (Oegema et al., 1999; Zheng et al., 1995) 
showed that mBuGZ and xBuGZ, but not YFP, YFP-xBuGZAN, 
or YFP-xBuGZ13S, stimulated MT assembly even at low con- 
centrations (Figures 5G, 5H, and S5A-S5C). xBuGZ-B, when 
present in excess, inhibited the xBuGZ-stimulated MT polymer- 
ization, whereas xBuGZ-B3S was much less effective (Fig- 
ure S5D). Thus, the BuGZ-stimulated tubulin concentration and 
MT polymerization require both MT/tubulin binding and phase 
transition of BuGZ. 

We estimated endogenous xBuGZ in extracts to be ~0.1 fiM 
and polyethylene glycol, a crowding agent, induced purified 
xBuGZ at this concentration to undergo phase transition 
in vitro (Figures S5E and S5F). Importantly, spindle concentration 



of xBuGZ was estimated as 0.5-0.86 fiM (Figure S5G). Thus, 
endogenous BuGZ could undergo phase transition to promote 
MT polymerization during spindle assembly. 

BuGZ Exhibits Phase Transition Property in Spindle 
Matrix 

If BuGZ in droplets could undergo continuous exchange with so- 
lution BuGZ, wild-type and mutant forms of YFP-xBuGZ should 
have different abilities to exchange into preformed His-xBuGZ 
droplets. Indeed, YFP-xBuGZ and YFP-xBuGZAN exchanged 
into the His-xBuGZ droplets efficiently, while YFP-xBuGZ5S 
and YFP-xBuGZI 3S exhibited weak and background incorpora- 
tions, respectively (Figure 6A). We also created two additional 
GFP-tagged fragments, xBuGZ-A and xBuGZ-C, corresponding 
to 111-187 aa and 376-452 aa in the DLC region (Figure S3F, 
pink and orange underlines), and their F/Y mutants: xBuGZ- 
A3S and xBuGZ-C5S, respectively. Similar to xBuGZ-B these 
fragments did not form droplets on their own (Figure S6A). 
GFP-xBuGZ-A, GFP-xBuGZ-B, and GFP-xBuGZ-C, but not their 
mutants, exchanged into and disrupted the His-xBuGZ droplets 
(Figure S6B; also see Figure S3G). 

Next, we incubated isolated spindle matrices with 0.1 fiM wild- 
type or mutant YFP-xBuGZ. YFP-xBuGZ and YFP-xBuGZAN 
exchanged into spindle matrices (marked by LB3) strongly, while 
YFP-xBuGZ5S exhibited weak exchange (Figure 6B). YFP- 
xBuGZ13S exhibited only background incorporation into the 
matrix, similar to the YFP control (Figure 6B). When isolated 
spindle matrices were incubated with 0.5 fiM GFP-xBuGZ-A, 
GFP-xBuGZ-B, or GFP-xBuGZ-C to disrupt the coacervation 
of endogenous xBuGZ, each fragment, but not their mutants, 
incorporated into and reduced the size of the matrix (Figure S6C). 
Therefore, phase transition is required for both BuGZ incorpora- 
tion into and the maintenance of preformed spindle matrix, 
indicating that BuGZ exhibits a phase transition property in the 
spindle matrix. 

Phase Transition and Tubulin Binding of BuGZ Promote 
MT Assembly from Spindle Matrix 

The exchange of YFP-xBuGZ or YFP-xBuGZAN into the spindle 
matrix replaced most endogenous xBuGZ, while YFP-xBuGZ5S 
and YFP-xBuGZ13S exhibited increasingly less replacement of 



Figure 3. Temperature- and Concentration-Dependent Phase Transition of BuGZ 

(A) Sequence features of xBuGZ. The line at 0.5 (y axis) is the cutoff for disorder (>0.5) and order (<0.5) predictions. P-FIT, VSL2B, VL3, and VLXT, predictors for 
disordered dispositions. LC1 , LC2, and LC3 indicate low complexity regions determined at three stringencies. ZnF, zinc fingers, predicted structured region. 

(B) YFP-xBuGZ formed spheres in Sf9 cells. YFP served as control. Scale bar, 20 [xm. 

(C) Gel filtration chromatography of YFP-xBuGZ. Arrowheads, positions of size markers (in kDa): thyroglobulin, apoferritin, amylase, alcohol dehydrogenase, 
albumin, and carbonic anhydrase. Fractions 7-14 were analyzed by gel electrophoresis and Coomassie staining. 

(D) YFP-xBuGZ formed droplets in vitro as visualized by DIC and fluorescence microscopy. Scale bar, 20 [xm. 

(E) Temperature-dependent droplet formation by YFP-xBuGZ in XB buffer as visualized by Fioffman modulation contrast microscopy. Temperature ramp, 
4°C-20°C at 1°C/min. Scale bar, 20 |xm. 

(F and G) Turbidity assay of reversibility (F) and repeatability (G) of phase transition by YFP-xBuGZ. Increase (4°C-35°C) and decrease (35°C-4°C) in temperature 
in (F) had the same temperature ramp. The temperature ramp in (G) was 4°C-30°C. Ramp rate, 0.5°C/min. 

(FI) Concentration-dependent accumulation of YFP-xBuGZ coacervates at the bottom of cuvettes after turbidity measurements. Scale bar, 1 cm. 

(I-K) Concentration- (color-coded) and temperature-dependent phase transition of YFP-xBuGZ (I) and YFP-xBuGZAN (J), but not YFP (K), based on the turbidity 
assay. 

(L and M) xBuGZ5S (L) and xBuGZ13S (M) coacervation at increasingly higher protein concentrations and temperatures. 

(N) The temperature at which the turbidity was half (T 1/2 ) of the difference between maximum and 4°C absorbance was plotted against log 10 protein concentration. 
YPF-xBuGZ13S did not reach maximum turbidity at 3.125 and 1.56 jxM even at 60°C. Error bars, SD from three independent experiments. 

See also Figure S3. 
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Figure 4. Bundling of MTs by BuGZ Depends on Its MT Binding and Phase Transition 

(A) YFP-xBuGZ, but not YFP-BuGZAN, YFP-BuGZ13S, or YFP, induced MT bundling from Taxol-stabilized and rhodamine-labeled MT seeds. 

(B and C) Quantifications of length (B) and average intensity (C) of MTs or MT bundles formed in the presence of indicated proteins and concentrations. MT images 
were randomly captured under a 63x objective. Approximately 100 individual MTs or MT bundles were measured. 

(D) YFP-xBuGZ droplets (green, white arrowheads) along some MT bundles (red). Line scans of tubulin and YFP-BuGZ intensity of the indicated segments of MTs 
(red arrowheads) are shown. 

(E and F) Purified YFP-xBuGZ, but not YFP-xBuGZ13S, had increased binding to preformed MTs at 37°C compared to 4°C, but YFP-xBuGZAN fail to bind. 
Quantification is shown in (F). 

Error bars, SEM. Student’s t test: *p < 0.05, **p < 0.01 , ***p < 0.001 , three independent experiments. Scale bars, 10 pm. See also Figure S4. 
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Figure 5. BuGZ Coacervation Promotes MT Polymerization by Concentrating Tubulin 

(A) Tubulin was concentrated in droplets formed by YFP-xBuGZ, but not by YFP-xBuGZAN. Scale bar, 20 [xm. 

(B) Anti-His-tag antibody pulled down indicated xBuGZ and tubulin at 4°C. 

(C) Illustration of the spin down assay. 

(D) Higher concentrations of YFP-xBuGZ and tubulin are found in droplets (y axis) than in initial solution concentrations (x axis). 

(E and F) Compared to initial solution concentrations (x axis), YFP-xBuGZAN (E) coacervation only concentrated itself, but not tubulin, in droplets (y axis), whereas 
YFP-xBuGZ13S (F) did not coacervate or concentrate itself or tubulin (y axis). 

(G and H) Representative fields of MTs polymerized. MTs were counted in 20 random microscopic fields using a 63x objective. Scale bar, 10 [xm. 

Error bars, SEM. Student’s t test: ns, not significant, *p < 0.05, **p < 0.01 , ***p < 0.001 , three independent experiments. See also Figure S5. 



endogenous xBuGZ (Figures 6B and S6D). The total BuGZ levels 
in the matrices, however, remained similar to controls (Fig- 
ure S6D). After incubating isolated spindle matrices with 



0.1 fiM of different YFP-xBuGZs followed by addition of 25 piM 
tubulin with or without nocodozole, the YFP-xBuGZAN-incor- 
porated spindle matrices had greatly diminished abilities to 



116 Cell 163 , 108-122, September 24, 2015 ©2015 Elsevier Inc. 







Cell 



None 



YFP 



2.5|jM 

1 

YFP-xBuGZ YFP-xBuGZ5S YFP-xBuGZI 3S YFP-xBuGZAN 





+Nocadazole 



-Nocadazole 

None None 



YFP 



YFP- YFP- YFP- 
xBuGZ5S xBuGZ13S xBuGZAN 



I I None E3 YFP-xBuGZ5S 

I I YFP □ YFP-xBuGZI 3S 

□ YFP-xBuGZ □ YFP-xBuGZAN 




+Tubulin 25|jM 



YFP 



YFP-xBuGZ 



YFP-xBuGZ13S 



YFP-xBuGZAN 



□YFP DYFP-xBuGZI 3S 

□YFP-xBuGZ 0YFP-xBuGZAN 




Input extract 



Spindle matrix 



Depletion: none IgG xBuGZ none IgG xBuGZ 









Addition: 






Input extract 



Spindle matrix 



Depletion: none IgG xBuGZ none none IgG xBuGZ none 






mm //W //## 



xBuGZ 









NuMA 




~~ mm 






xBuGZ 

















— — 


Dynein (70.1) 





Spindle matrix 



Depletion: none IgG xBuGZ 



xBuGZ 



Spindle matrix 



LB3 



Eg5 



Depletion: none IgG xBuGZ none 

Cq (& 

A 









^ ^ > r r r T* 






It- - 


XMAP215 

xBuGZ 












*** 




— 


Tubulin 





NuMA 

Dynein (70.1) 

LB3 

Eg5 

XMAP215 

Tubulin 



Figure 6. Tubulin/MT Binding and Phase Transition of BuGZ Promote Spindle Matrix Assembly and Function 

(A) Incorporation of YFP-xBuGZ into preformed His-xBuGZ droplets in vitro required Fs and Ys but not the N terminus of xBuGZ. YFP intensity was quantified in 
~50 droplets. 

(B) Incorporation of 0.1 pM YFP-xBuGZ into isolated spindle matrix required Fs and Ys but not the N terminus of xBuGZ. Approximately 30 structures were 
analyzed. 

(C) Incubation of YFP-xBuGZAN, but not YFP, YFP-xBuGZ, or YFP-xBuGZ13S, with isolated spindle matrix disrupted matrix-mediated MT assembly. 
Approximately 30 asters were analyzed. 

(D) Spindle matrix assembly required MT-binding of BuGZ. YFP-xBuGZ, but not YFP-xBuGZAN, rescued spindle matrix assembly upon endogenous xBuGZ 
depletion, as assayed by western blotting analyses using spindle matrix markers. 

(legend continued on next page) 
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concentrate tubulin and to promote MT polymerization (Figures 
6C and S6E). YFP-xBuGZ13S did not affect such abilities of 
the matrices (Figures 6C and S6E), consistent with its failure to 
incorporate (Figure 6B and S6D). Thus phase transition and 
tubulin-binding of BuGZ are required for the spindle matrix to 
concentrate tubulin and promote MT assembly. 

Phase Transition and MT Binding of BuGZ Promote 
Spindle Matrix Assembly 

To test whether phase transition of xBuGZ is critical for spindle 
matrix assembly, extract depleted of endogenous xBuGZ 
was supplemented with purified YFP-xBuGZ, YFP-xBuGZAN, 
or YFP-xBuGZ13S. Alternatively, 2 fiM of purified BuGZ-B or 
BuGZ-B3S were added to unperturbed extract. We found 
that YFP-xBuGZ, but not xBuGZAN orxBuGZ13S, rescued spin- 
dle matrix assembly (Figures 6D and 6E). xBuGZ-B, but not 
xBuGZ-B3S, disrupted spindle matrix assembly (Figure 6E). 
Therefore, both MT binding and phase transition of BuGZ are 
required for spindle matrix assembly. 

Phase Transition and MT Binding of BuGZ Promote 
Spindle Assembly 

To understand whether phase transition of BuGZ is important for 
spindle assembly, we used GFP-xBuGZ-A, GFP-xBuGZ-B, and 
GFP-xBuGZ-C to disrupt phase transition of endogenous xBuGZ 
in extract. One micromolar each of the fragments disrupted spin- 
dle assembly induced by Aurora A beads or sperm, whereas their 
mutants were much less effective (Figures S7A-S7D). When iso- 
lated spindles were incubated with 0.2 jiM GFP-xBuGZ-A, GFP- 
xBuGZ-B, or GFP-xBuGZ-C, a concentration that did not disrupt 
spindle assembly, we found efficient spindle incorporation of 
these fragments, whereas their mutants incorporated poorly (Fig- 
ure S7E). Thus, these fragments may bind to endogenous BuGZ 
to disrupt phase transition and spindle assembly. 

Next, we incubated isolated Aurora A-bead spindles with 
0.1 (iM YFP-xBuGZ or mutants. YFP-xBuGZ, YFP-xBuGZ5S, 
and YFP-xBuGZI 3S sequentially showed decreasing exchanges 
into spindles (Figure 7A), confirming the importance of BuGZ 
phase transition in spindle association. YFP-xBuGZAN exhibited 
the weakest exchange of all (Figure 7A), consistent with the idea 
that MT binding of BuGZ facilities its phase transition. We then 
depleted endogenous xBuGZ in extract and induced spindle 
formation by Aurora A beads or sperm. Spindle defects due to 
xBuGZ depletion were fully rescued by purified xBuGZ, but not 
by xBuGZAN or YFP-xBuGZI 3S (Figures 7B-7F). 

In the predicted DLC region, mBuGZ has extra 1 5 amino acids 
(including 1Y and IF) not found in xBuGZ (see Figure S3F). We 
replaced these two and the other 13 Fs and Ys with S to create 
mBuGZ15S. HeLa cells were transfected with control or hBuGZ 
siRNAs followed by expression of RNAi-insensitive Flag-mBuGZ 
or -mBuGZ15S. Reduction of MT intensity due to hBuGZ RNAi 
was rescued by wild-type mBuGZ but not by mBuGZ15S (Fig- 



ures 7G-7I). Since xBuGZAN also failed to rescue MT intensity 
(Figures 1J-1L), both MT binding and phase transition of BuGZ 
are required to promote spindle assembly in cells. 

DISCUSSION 

The vague structure-function definitions and uncertain composi- 
tion of the spindle matrix have made its study both challenging 
and controversial. Among the studied spindle matrix proteins, 
LB was initially suggested to be a structural component of the 
matrix because its depletion resulted in reduced spindle matrix 
as judged by markers, such as NuMA and Eg5 (Tsai et al., 
2006). However, due to the difficulties in studying lamins bio- 
chemically, the structural role that LB assumes in the spindle 
matrix remains challenging to decipher. Similarly, despite the 
identification of several spindle matrix proteins in Drosophia, 
the assembly mechanism of these proteins remains unknown 
(Johansen and Johansen, 2007). Through analyses of the spindle 
matrix component BuGZ, which can be expressed and purified 
as a soluble protein, we have uncovered its phase transition 
property in spindle and spindle matrix assembly. 

Based on our in vitro studies, we propose that at low tempera- 
ture the DLC region of BuGZ assumes a variety of quasi-folded 
states in solution due to weak intra-molecular hydrophobic inter- 
actions and a water shell surrounding the molecule, which limits 
intermolecular interactions (Figure 7J). Temperature increase 
disrupts (or denatures) the quasi-folded BuGZ and the water shell 
to allow intermolecular interactions, leading to phase transition 
(Figure 7K). By studying phase transition of BuGZ in spindle 
and matrix, we propose that during spindle assembly, the binding 
of N-terminal BuGZ to MTs limits quasi-folding of BuGZ. By 
bringing BuGZ molecules close to one another on MTs, intermo- 
lecular interactions and phase transition of BuGZ are enhanced 
(Figure 7L), which in turn bundles MTs and concentrates tubulin 
(Figure 7M). The elevated tubulin concentration near existing 
MTs then promotes MT polymerization during spindle assembly. 

Our mutational analyses show that BuGZ coacervation de- 
pends on highly conserved Fs and Ys in the DLC region of 
BuGZ. Hydrophobicity-dependent phase transition has been 
studied in various proteins and polymers. In the well-character- 
ized phase transition of elastin, hydrophobic patches are 
required for its coacervation, which is critical for subsequent fila- 
ment assembly (Yeo et al., 201 1). Our analyses of the predicted 
DLC region of BuGZ revealed that highly conserved hydrophobic 
residues and prolines (P) are enriched in two segments that flank 
a region with relatively low P and hydrophobicity. This suggests 
that intermolecular hydrophobic interactions mediated by the 
hydrophobic patches contribute to BuGZ coacervation. Consis- 
tent with this, BuGZ coacervation is dependent on temperature, 
protein concentration, and Fs and Ys. The aromatic feature of 
these residues may also mediate phase transition independent 
of their hydrophobicity. Inhibition of BuGZ coacervation by the 



(E) Spindle matrix assembly required BuGZ coacervation. YFP-xBuGZ, but not YFP-xBuGZ13S, rescued spindle matrix assembly upon endogenous xBuGZ 
depletion. The addition of GFP-BuGZ-B, but not -BuGZ-B3S, into unperturbed extract disrupted spindle matrix assembly. 

Error bars, SEM. Student’s t test: ns, not significant, **p < 0.01 , ***p < 0.001 , three independent experiments. The numbers of structures quantified are for each 
experiment and condition. Scale bars, 10 . See also Figure S6. 
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Figure 7. Tubulin/MT Binding and Coacervation of BuGZ Promotes Spindle Assembly 

(A) Incorporation of 0.1 pM YFP-xBuGZ into isolated spindles required Fs and Ys and MT binding. Approximately 30 spindles were analyzed. Scale bar, 10 pm. 
(B-D) xBuGZ, but not xBuGZAN or xBuGZI 3S, rescued astral MT length (B), percentages of normal spindles (C), and length of spindles (D) induced by Aurora A 
beads. Approximately 50 (B and D) or 500 (C) structures were analyzed. 

(E and F) Only wild-type xBuGZ, but not xBuGZAN or xBuGZ13S, rescued defective morphology (E) or MT intensity (F) of spindles in xBuGZ-depleted extracts. 
Approximately 500 sperm-associated MT structures (E) or 50 spindles (F) were analyzed. 

(G-l) Expression of mBuGZ, but not mBuGZI 5S, in hBuGZ depleted HeLa cells (G) rescued normal spindle MT intensity (H and I). Approximately 30 spindles were 
analyzed. Spindles, centromeres, and chromosomes were stained. Scale bars, 5 pm. 

(J-M) Models for BuGZ phase transition in vitro (J and K) or during spindle assembly (L and M). See explanations in the Discussion. 

Error bars, SEM. Student’s t test: ns, not significant, *p < 0.05, **p < 0.01 , ***p < 0.001 from three independent experiments. The numbers of structures analyzed 
are for each experiment and condition. See also Figure S 7. 
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xBuGZ fragments containing F and Y could be caused by disrup- 
tion of proper alignment of hydrophobic and aromatic residues 
important for intermolecular BuGZ interactions. BuGZ droplets 
formed in vitro do not appear to further form filaments based 
on our preliminary analyses by electron microscopy (unpub- 
lished data). Although additional studies are required to fully un- 
derstand the biophysical properties that underline BuGZ phase 
transition, our findings suggest that unfolding of BuGZ at 
elevated temperature could drive coacervation through intermo- 
lecular hydrophobic interactions. 

The temperature-dependent BuGZ phase transition and spin- 
dle matrix stability suggest that BuGZ is a structural component 
of spindle matrix. Indeed, based on known markers of Xenopus 
spindle matrix, we show that BuGZ in spindle matrix exhibits a 
phase transition property, which is required for assembly and 
stability of spindle matrix. The MT-independent and tempera- 
ture-dependent phase transition of BuGZ that we uncover here 
also help to explain why upon MT depolymerization the spindle 
matrix can retain its structural integrity better at RT than on ice. 
Our findings beg the question whether phase transition repre- 
sent a key structural property of spindle matrix. 

Using PONDR and SEG, we found additional DLC-containing 
proteins in the proteome of Xenopus spindle matrix (unpublished 
data). Similar analyses revealed the presence of long stretches of 
DLC regions in several Drosophila spindle matrix proteins, such 
as Megator, Chromator, EAST, and Skeletor (unpublished data). 
Since multiple proteins can undergo phase transition together 
through complicated intermolecular interactions, we speculate 
that the spindle matrix could be a complex coacervate, whose 
formation relies on various combinations of proteins depending 
on organisms and cell types. These coacervates could contain 
different phases to segregate different biochemical reactions 
that could communicate with one another. This idea may explain 
the apparent un-relatedness of some spindle matrix compo- 
nents. Formation of complex coacervates may involve both 
DLC and structured proteins, which could explain why the largely 
structured proteins, such as lamin and actin, are found in the 
spindle and spindle matrix (Zheng and Tsai, 2006; Pickett-Heaps 
and Forer, 2009). Additionally, proteins participating in phase 
transition in the spindle and spindle matrix could undergo rapid 
flux in and out of the structures. The phase transition features 
may explain the lack of a discrete localization of BuGZ on spindle 
MTs and a lack of a clearly defined morphology of the spindle 
matrix upon MT depolymerization. 

We show that phase transition of BuGZ does not require MTs, 
but the MT binding domain of BuGZ helps to not only facilitate 
BuGZ phase transition along MTs but to concentrate tubulin. 
Both phase transition and MT binding of BuGZ are required for 
promoting MT polymerization and bundling in vitro. Importantly, 
we show that all of these properties are also required for BuGZ 
to promote spindle assembly. Therefore, one functional conse- 
quence of BuGZ-mediated spindle matrix assembly along exist- 
ing MTs appears to allow efficient polymerization and bundling of 
MTs in the spindle. By increasing local tubulin concentration, 
BuGZ could also promote MT nucleation from microtubule nu- 
cleators such as yTuRC (Zheng et al., 1995). 

Recent studies have shown that MT-dependent branched MT 
nucleation facilitates spindle assembly (Petry et al., 2013). Since 



BuGZ could concentrate tubulin along existing MTs, it would 
be interesting to further study whether BuGZ-mediated phase 
transition could help to concentrate other SAFs known to promote 
branched MT nucleation. Additionally, we have shown that BuGZ 
interacts with Bub3 to promote the binding of Bub3 to kineto- 
chores in a MT-dependent manner (Jiang et al., 2014). Purified 
BuGZ-Bub3 complex undergoes more efficient phase transition 
than BuGZ alone in vitro (unpublished data). It would be inter- 
esting to further study whether increasing Bub3 concentration 
along spindle MTs via phase transition could promote assembly 
of the Bub3-Bub1 -BubRI complex for its kinetochore loading. 

We have shown that nuclear transport receptors such as im- 
portin a and p disrupt spindle matrix assembly, which is attenu- 
ated by RanGTP (Tsai et al., 2006). Since BuGZ and some other 
known spindle matrix components are nuclear proteins in inter- 
phase, it will be important to further explore whether the 
RanGTP-importin system regulates phase transition of BuGZ 
and other spindle matrix proteins in mitosis. As a conserved pro- 
tein found in both vertebrates and invertebrates, our study of 
BuGZ here should open a door to further characterize the struc- 
tural properties and functions of the spindle matrix in different 
organisms. 

EXPERIMENTAL PROCEDURES 

Expression Vectors 

For all expression constructs see Table SI . 

Cell Culture and Xenopus Egg Extract 

All cells were grown under standard culturing conditions. CSF egg extracts 
were prepared as described before and only those that were tested to support 
spindle assembly were used for further experiments. All assays in egg extracts 
are detailed in the Supplemental Experimental Procedures. 

Immunofluorescence and Quantifications 

Cells were fixed by 4% paraformaldehyde in PBS for 7 min, followed by extrac- 
tion in 0.5% Triton in PBS for 10 min. Xenopus MT structures were fixed by 
ice cold methanol for 5 min. Samples were then blocked in 4% BSA in PBS 
for >1 hr followed by primary antibody (Table S2) incubation overnight at 
4°C. Nikon ECLIPSE E800 or Leica SP5 microscopes were used for imaging. 
To quantify spindle MT intensity or the ratio of BuGZ and MT immunostaining 
intensities, metaphase spindles in cells, Aurora A, or sperm MT structures 
were captured at the same exposure using a 63 x objective on Nikon ECLIPSE 
E800. Two 15x15 pixel regions, corresponding to the brightest areas of 
each half spindle was chosen based on tubulin or BuGZ intensity and the 
average intensities in these areas were determined. The background fluores- 
cence was subtracted using the intensity measured in areas away from the 
spindle. 

Protein Expression, Purification, and Interaction 

Proteins were purified using Glutathione agarose (Sigma) or Ni-NTA agarose 
(QIAGEN) according to manufacturer’s protocols. Some proteins were further 
purified by gel filtration. To study tubulin and xBuGZ interaction, cycled tubulin 
was added to beads coated with purified His-tagged YFP-xBuGZ, YFP- 
BuGZAN, or YFP via antibody to 6His and incubated. See details in the Sup- 
plemental Experimental Procedures. 

BuGZ Phase Transition 

Purified mBuGZ, xBuGZ, or control proteins were thawed on ice and diluted 
into ice-cold buffers on ice followed by incubation at 37°C for 5 min and differ- 
ential interference contrast (DIC) or fluorescence microscopy. For turbidity 
assay, purified proteins were diluted in XB buffer on ice (300 ^1 final) and 
then loaded into 750-[xl cuvettes (28F-Q-10, Starna Cells) in a cold room. 
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Turbidity was measured at 440 nm using a Cary 300 UV-VIS spectrophotom- 
eter with a Peltier-Thermostatted Multi-cell Holder (Agilent Technologies). 
Temperature ramp rate was 0.5°C/min from 4°C to 60°C. To estimate protein 
concentrations in BuGZ droplets, 100 [xl of protein samples were incubated 
at 37°C for 5 min followed by centrifugation at 2,000 rpm in a microfuge 
(Eppendorf E5430) for 5 min at room temperature (RT). Proteins in pellet frac- 
tions were quantified by Coomassie blue staining using BSA as standard. See 
details in the Supplemental Experimental Procedures. 

MT Polymerization, Bundling, and Binding 

Purified xBuGZ and cycled tubulin were mixed on ice. MT polymerization was 
performed at 37°C for 5 min. After fixation, MTs were counted in 20 random 
microscopic fields using a 63 x objective. The same amount of rhodamine- 
MT seeds were mixed with different purified xBuGZ proteins and incubated 
at 37°C for 5 min and imaged immediately. MT length and intensity were 
measured based on micrographs. To visualize the binding of wild-type and 
mutant xBuGZ to MTs, rhodamine-labeled and Taxol-stabilized MTs were 
mixed with different versions of YFP-xBuGZ. See details in the Supplemental 
Experimental Procedures. 

Sequence Analysis for Protein Disorder and Low Complexity 

The PONDR (http://www.disprot.org/index.php) (Xue et al., 2010) and SEG 
programs (http://mendel.imp.ac.at/METHODS/seg.server.html) (Wootton, 
1994) were used to analyze the DLC regions of BuGZ and other spindle matrix 
proteins at default settings. See details in the Supplemental Experimental 
Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and two tables and can be found with this article online at 
http://dx.doi.Org/1 0. 1 01 6/j. cell. 201 5.08.01 0. 
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SUMMARY 

Stress granules are membrane-less organelles 
composed of RNA-binding proteins (RBPs) and 
RNA. Functional impairment of stress granules has 
been implicated in amyotrophic lateral sclerosis, 
frontotemporal dementia, and multisystem protein- 
opathy — diseases that are characterized by fibrillar 
inclusions of RBPs. Genetic evidence suggests a 
link between persistent stress granules and the 
accumulation of pathological inclusions. Here, we 
demonstrate that the disease-related RBP hnRNPAI 
undergoes liquid-liquid phase separation (LLPS) into 
protein-rich droplets mediated by a low complexity 
sequence domain (LCD). While the LCD of hnRNPAI 
is sufficient to mediate LLPS, the RNA recognition 
motifs contribute to LLPS in the presence of RNA, 
giving rise to several mechanisms for regulating as- 
sembly. Importantly, while not required for LLPS, 
fibrillization is enhanced in protein-rich droplets. 
We suggest that LCD-mediated LLPS contributes 
to the assembly of stress granules and their liquid 
properties and provides a mechanistic link between 
persistent stress granules and fibrillar protein pathol- 
ogy in disease. 



INTRODUCTION 

It has recently emerged that cells organize many biochemical 
processes in membrane-less compartments that have liquid- 
like properties, exemplified by germ granules in C. elegans and 
nucleoli in X. laevis (Brangwynne et al., 2009, 2011). It has 
been proposed that membrane-less organelles arise through a 
process of liquid-liquid phase separation (LLPS), which permits 
the requisite components of membrane-less organelles to 
become rapidly and reversibly concentrated in discrete loci in 
cells (Hyman et al., 2014). Although the molecular details under- 

CrossMark 



lying LLPS in cells are largely obscure, several recent reports 
indicate that constituent proteins harboring intrinsically disor- 
dered, low complexity sequence domains (LCDs) can mediate 
this process. For example, RNA helicase DDX4, a LCD-contain- 
ing constituent of germ granules, forms phase-separated organ- 
elles that exhibit liquid properties in vitro and in live cells (Nott 
et al., 2015). Related, LAF-1 undergoes LLPS in vitro and is 
required for P granule assembly in C. elegans (Elbaum-Garfinkle 
et al., 2015). Additional RNA/protein assemblies similarly are 
membrane-less organelles that exhibit liquid properties and 
may assemble by LLPS, including stress granules, P bodies, 
and Cajal bodies (Hyman et al., 2014; Wippich et al., 2013). 

Stress granules are membrane-less cytosolic bodies 
composed of mRNAs and proteins that assemble when transla- 
tion initiation is limiting and are thought to represent a pool of 
mRNPs stalled in the process of translation initiation (Anderson 
and Kedersha, 2009; Buchan and Parker, 2009). A wealth of ge- 
netic evidence has emerged over the past 5 years implicating 
stress granules as a subcellular compartment that is central to 
the pathogenesis of a closely related set of degenerative dis- 
eases, including amyotrophic lateral sclerosis (ALS), frontotem- 
poral dementia (FTD), and inclusion body myopathy (IBM) (Li 
et al., 2013; Ramaswami et al., 2013). These degenerative dis- 
eases are characterized pathologically by cytoplasmic inclu- 
sions composed of fibrillar deposits of heterogeneous nuclear 
ribonucleoproteins (hnRNPs) in affected cells (Kim et al., 2013; 
Ramaswami et al., 2013). Conspicuously, inherited forms of 
ALS, FTD, and myopathy are often caused by missense muta- 
tions impacting hnRNPs, such as TDP-43, FUS, hnRNPAI, 
hnRNPA2B1 , hnRNPDL, and TIA-1 (Kim et al., 2013; Klar et al., 
2013; Kwiatkowski et al., 2009; Sreedharan et al., 2008; Vieira 
et al., 2014). These hnRNPs are all components of stress gran- 
ules, and disease-causing mutations in these proteins are asso- 
ciated with accumulation of persistent stress granules (Bosco 
et al., 2010; Hackman et al., 2013; Kim et al., 2013). ALS, FTD, 
and myopathy are also caused by mutations in VCP/p97, which 
are associated with impaired autophagic clearance of stress 
granules (Buchan et al., 2013). ALS-causing mutations in the 
actin-binding protein Profilin 1 similarly impair stress granule dy- 
namics (Figley et al., 2014). Thus, a variety of genetic and cell 
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biological insights have focused attention on alteration in stress 
granule dynamics as a key defect in the pathogenesis of ALS, 
FTD, and myopathy, yet the mechanism that leads to accumula- 
tion of fibrillar hnRNP pathology remains obscure. 

hnRNPAI is a prototypical hnRNP consisting of two folded 
RNA recognition motifs (RRMs) that occupy the N-terminal half 
of the protein and a LCD that occupies the C-terminal half. 
Missense mutations in the LCD of hnRNPAI cause ALS and 
multisystem proteinopathy (MSP), a pleiotropic degenerative 
disorder affecting muscle and brain (Kim et al., 2013). hnRNPAI 
and closely related hnRNPs exhibit intrinsic propensity to 
assemble into amyloid-like fibrils containing cross-p structure, 
and this property has been proposed to mediate stress granule 
assembly (Kato et al., 2012). However, stress granules are dy- 
namic assemblies; its components have residence times varying 
between seconds and minutes, and indeed, the assembly and 
disassembly of entire granules are accomplished on this same 
timescale (Buchan and Parker, 2009). These rapid dynamics 
argue in favor of a mechanism that permits rapid assembly and 
disassembly, such as LLPS, and suggest that, rather than ac- 
counting for their assembly, fibrillization by hnRNPAI and related 
hnRNPs may represent specialized components that accrue 
within stress granules. Here, we demonstrate that the RBP 
hnRNPAI undergoes LLPS mediated by the LCD to form pro- 
tein-rich droplets. While the LCD of hnRNPAI is sufficient to 
mediate phase separation, the folded RNA recognition motifs 
contribute to phase separation in the presence of RNA, giving 
rise to several mechanisms for regulating assembly. Importantly, 
while not required for phase separation, fibrillization is enhanced 
in protein-rich droplets. These results suggest that LCD-medi- 
ated LLPS contributes to the assembly of stress granules and 
their liquid properties and reveal the mechanistic link between 
persistent stress granules and fibrillar protein pathology in 
disease. 

RESULTS 

hnRNPAI Undergoes Liquid-Liquid Phase Separation 

To gain insight into the role of individual RBPs with LCDs in the 
assembly of stress granules, we expressed and purified 
hnRNPAI and TDP-43 as fusions with solubility-enhancing His- 
SUMO tags (His-SUMO-hnRNPAI and His-SUMO-TDP-43). 
Importantly, this purification always included careful RNA diges- 
tion followed by ion exchange and gel filtration chromatography 
to remove all nucleotides (Figure SI). The hnRNPAI solution ex- 
hibited temperature-dependent reversible turbidity (Figure 1A), 
which was revealed by differential interference contrast micro- 
scopy to reflect the presence of numerous droplets (Figure 1 B). 
The His-SUMO-TDP-43 solution was also turbid due to the pres- 
ence of a multitude of small droplets (Figure 1 C). The formation of 
hnRNPAI droplets was inducible by a decrease in temperature, 
was rapidly reversible, and required a minimum protein concen- 
tration that was dependent on temperature (Movie SI). Droplets 
of hnRNPAI exhibited wetting when they encountered the sur- 
face of the coverslip, suggesting liquid properties (Figure ID 
and Movie S2). To further probe the nature of the droplets, weflu- 
orescently labeled hnRNPAI by conjugation to Oregon Green 
and observed that these protein droplets tended to fuse rapidly 



into larger droplets within seconds, further reflecting liquid prop- 
erties (Figure IE and Movie S3). Removal of the His-SUMO tag 
from hnRNPAI led to the same observations, demonstrating 
that properties intrinsic to hnRNPAI mediate the ability for as- 
sembly into droplets (Figure S2). We assessed the mobility of 
hnRNPAI molecules between the droplet and bulk phases by 
fluorescence recovery after photobleaching (FRAP) measure- 
ments (Figure IF). After photobleaching a single droplet, the 
majority of its fluorescence signal (~80%) recovered with a char- 
acteristic recovery time of 3.7 s (Table SI). These data demon- 
strate that hnRNPAI is highly dynamic, with rapid exchange of 
molecules between the droplets and the surrounding solution. 
The appearance of this second liquid phase in a temperature- 
and protein-concentration-dependent manner is consistent 
with LLPS by hnRNPAI as described by Flory-Huggins theory 
(Flory, 1942; Huggins, 1942). hnRNPAI is also able to assemble 
into hydrogels composed of uniformly polymerized amyloid-like 
fibers (Kato et al., 2012). 

We produced hydrogels from purified His-SUMO-hnRNPAI 
according to the protocol of (Kato et al., 2012); thus, purified 
hnRNPAI was dialyzed at 4°C overnight, sonicated, concen- 
trated, and incubated for 48 hr at room temperature, resulting 
in hydrogel formation. While hnRNPAI hydrogels exhibit dy- 
namic properties (Kato et al., 2012), they did not show any 
detectable fluorescence recovery after photobleaching in exper- 
iments lasting >15 min (Figures 1G and 1H), demonstrating that 
hnRNPAI is more rigidly incorporated into hydrogels than into 
liquid droplets. These data are in agreement with the report 
that hnRNPAI hydrogels are composed of cross-|3 fibrils, which 
may represent a thermodynamically stabilized or kinetically 
trapped state of the protein. While hnRNPAI droplets showed 
a wide size distribution and grew over time by fusion events, 
TDP-43 droplets were similar in size with an upper limit of 
~1 fim and often appeared in strings as if fusion events into 
larger droplets were initiated but did not proceed (Movie S4). 
Their spherical nature suggested that they were also formed by 
LLPS, but since the material properties of TDP-43 droplets 
appeared more complex than classic liquid, we focused on the 
biophysical properties of hnRNPAI going forward. 

LLPS has been proposed as the molecular mechanism under- 
lying formation of membrane-less cellular bodies that exhibit 
liquid properties, such as P granules and nucleoli (Brangwynne 
et al., 2009; Elbaum-Garfinkle et al., 2015; Fromm et al., 2014; 
Li et al., 2012; Nott et al., 2015). We observed that stress gran- 
ules in cells exhibit liquid properties, regularly fusing into larger 
structures (Figure II and Movie S5). Moreover, hnRNPAI in 
stress granules is in dynamic equilibrium with the surrounding 
cytosol, as illustrated by FRAP measurements showing similar 
recovery times (4.2 s) to purified hnRNPAI in liquid droplets (Fig- 
ure 1 J). These classic liquid properties suggest that stress gran- 
ules represent a separate liquid phase that is formed via LLPS. 

The LCD of hnRNPAI Mediates Liquid-Liquid Phase 
Separation and Is Sufficient for Incorporation into Stress 
Granules 

In order to map the domains responsible for LLPS of hnRNPAI , 
we engineered His-SUMO fusion constructs containing either 
the folded N-terminal RNA recognition motifs (A1-RRM) or the 
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Figure 1. hnRNPAI Spontaneously and Reversibly Assembles into Liquid Droplets 

(A) Test tubes containing 500 BSA or 500 jxM His-SUMO-hnRNPAI , respectively, were alternated between 4°C and 25°C. 

(B) Transparent BSA or turbid hnRNPAI solution observed by differential interference contrast (DIC) microscopy at 10°C. 

(C) TDP-43 droplets observed by DIC at 25° C. 

(D) 300 jxM hnRNPAI in 100 mM NaCI exhibited wetting at the surface of the coverslip. Images were extracted from Movie S2. 

(E) Fluorescence micrographs of hnRNPAI (spiked with Oregon-green-labeled hnRNPAI at a molar ratio of 300:1) in 150 mM NaCI buffer at 10°C reveal that the 
protein is enriched in the droplets; the droplets fuse over time. The main image and the panel on the right were extracted from Movie S3. 

(F) FRAP of fluorescently labeled/unlabeled hnRNPAI at a molar ratio of 1 :300. The black curve is an average of FRAP events from nine distinct droplets; the error 
bars represent the SE. The red curve corresponds to a double exponential fit of the data. The two characteristic recovery times are 3.72 s and 31 .6 s. See also 

Table SI . 

(G) An area of hydrogel (white arrow) was photobleached over the course of 60 s. A decrease of the fluorescence intensity was observed but no recovery. The 
yellow arrow indicates an area of hydrogel photobleached 15 min before. 

(H) FRAP of hydrogels. The black curve is an average of FRAP events from three different hydrogel pieces; the error bars represent the SE. 

(I) Live imaging of U20S cells expressing G3BP-GFP. The cells were stressed for 1 hr with 0.5 mM arsenite, and stress granule formation was observed. Stress 
granules fused over time. The main image and the panel on the right were extracted from Movie S5. 

(J) FRAP of hnRNPAI in stress granules. The black curve is an average of FRAP events from 12 distinct stress granules from 12 distinct cells; the error bars 
represent the SE. The red curve corresponds to a single exponential fit of the data. The characteristic recovery time is 4.21 s. See also Table SI . 



C-terminal LCD (A1-LCD) (Table S2), which is predicted to be 
intrinsically disordered (Figures 2A and S3). The A1-LCD alone 
had the ability to form liquid droplets, whereas A1-RRM failed 
to undergo LLPS under comparable conditions to full-length 
hnRNPAI (A1-FL) and all other conditions tested (Figure 2B). 

hnRNPAI amino acid residues 259-264 correspond to a steric 
zipper motif centered in the LCD and are essential to hnRNPAI ’s 
intrinsic tendency to fibrillize (Kim et al., 2013). Importantly, the 
corresponding deletion mutant (Al -Ahexa), which does not fibril- 
lize (Kim et al., 2013), readily underwent LLPS, demonstrating 
that LLPS and fibrillization are two mechanistically distinct pro- 
cesses (Figures 2A and 2B). To test the role of LCD-mediated 
LLPS in the formation of stress granules, we transiently ex- 
pressed wild-type, GFP-tagged LCD from hnRNPAI (GFP- 
LCD) or a version with deletion of aa259-264 constituting the 
steric zipper (GFP-LCD Ahexa) in HeLa cells (Figure 2C). Both 
proteins were efficiently incorporated into stress granules, 
suggesting that stress granule assembly does not require fibrilli- 



zation and hence is distinct from hydrogel formation (Figures 2D 
and 2E). 

Liquid-Liquid Phase Separation by hnRNPAI Is Based 
on Weak Interactions 

To gain further insight into the intermolecular interactions under- 
lying LLPS by hnRNPAI , we measured the temperature at which 
droplets first formed as a function of protein concentration and 
molecular crowding, allowing the construction of a phase dia- 
gram (Figure 3A). Whereas LLPS by hnRNPAI occurs spontane- 
ously in a temperature- and protein-concentration-dependent 
manner in the absence of a crowding agent (Figure S4A), we 
mapped the phase diagram in the presence of Ficoll, a typical 
crowding agent, to mimic the crowded cellular environment, 
which is thought to contain ~200 mg/ml of macromolecules (El- 
lis, 2001). Ficoll was used for most experiments, but polyeth- 
ylene glycol (PEG) was also able to promote hnRNPAI LLPS 
(Figure S4B). These data demonstrated that the propensity for 
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Figure 2. Liquid-Liquid Phase Separation by 
hnRNPAI Is Mediated by the C-Terminal 
Low Complexity Sequence Domain and Is 
Distinct from Fibrillization 

(A) Schematic of the structure of hnRNPAI full 
length (A1-FL), the N terminus comprising the two 
folded RNA recognition motifs (A1-RRM), the low 
complexity sequence domain (A1-LCD), and the 
mutant with a deletion of residues 259-264 (Kim 
et al., 2013) (Al-Ahexa). 

(B) DIC images of A1-FL, A1-RRM, A1-LCD, and 
Al-Ahexa at 140 [xM protein, 150 mg/ml Ficoll in 
50 mM HEPES, 300 mM NaCI, and 5 mM DTT. 

(C) Schematic of the constructs transiently ex- 
pressed in HeLa cells. 

(D) Representative confocal microscopy images of 
HeLa cells transfected with constructs presented 
in (C), treated with 0.5 mM sodium arsenite for 
15 min, and immunostained with anti-elF4G (red) 
and DAPI (blue). 

(E) Quantification for data in (D). The percentage of 
transfected cells displaying GFP signal in SGs 
([number of cells with GFP-positive SGs/number of 
GFP-expressing cells] x 100) was plotted as mean 
± SEM; n = 100 cells; **p < 0.005, ***p < 0.001 by 
one-way ANOVA, Tukey’s post hoc test. 



phase separation increased with increased molecular crowding, 
and the hnRNPAI concentration necessary for phase separation 
drops substantially approaching conditions of intracellular mo- 
lecular crowding. From the shape of the phase diagram, we 
conclude that hnRNPAI has an upper critical solution tempera- 
ture (UCST), i.e., a critical temperature exists, above which the 
two-phase regime cannot be accessed. A UCST phase diagram 
indicates that LLPS is driven mostly by enthalpy, with favorable 
interactions between protein molecules mediating assembly 
(Flory, 1942; Huggins, 1942). We also observed that lowering 
the NaCI concentration led to LLPS at lower A1-FL concentra- 
tions, suggesting that electrostatic interactions contributed to 
LLPS. Again, the hnRNPAI concentration necessary for phase 
separation dropped substantially approaching conditions of 
intracellular salt concentration (Figure 3B). Interestingly, LLPS 
by A1-FL was disrupted by hexanediol, a compound that dis- 
ables the selectivity filter of the nuclear pore complex by disrupt- 
ing the interactions of phenylalanines in the FG repeats (Patel 
et al., 2007; Ribbeck and Gorlich, 2002), suggesting that aro- 
matic residues in the LCD contribute to LLPS of hnRNPAI (Fig- 
ure 3C). These results indicate that multiple types of favorable 
molecular interactions contribute to hnRNPAI LLPS. 

Increasing the Cytoplasmic Concentration 
of Endogenous LCD-Containing hnRNPs Is Sufficient 
to Drive Stress Granule Assembly 

To manipulate the cytoplasmic concentration of endogenous 
hnRNPAI and related hnRNPs in cells, we transiently expressed 
M9M peptide in HeLa cells. M9M peptide was designed to have 
a significantly greater affinity for Karyopherin-(32 than natural PY- 
NLSs present in hnRNPAI and several closely related RNA-bind- 
ing proteins (Bernis et al., 2014; Cansizoglu et al., 2007). As a 
result, M9M prevents Karyopherin-(32 from binding a select sub- 
set of PY-NLS-containing endogenous clients and results in their 



accumulation in the cytoplasm (Cansizoglu et al., 2007; Dormann 

et al., 201 2). We observed that transient expression of YFP-M9M 
in HeLa cells resulted in increased cytoplasmic concentration of 
hnRNPAI and related LCD-containing RNA-binding proteins 
(hnRNPA2 and FUS), resulting in an increased assembly of 
stress granules compared to the cells transfected with YFP 
only (Figures 3D, 3E, S5A, and S5B). Together with the in vitro 
data, these observations suggest that concentration-dependent 
LLPS drives assembly of stress granules and requires a 
threshold protein concentration. 

RNA Facilitates Liquid-Liquid Phase Separation 
by hnRNPAI by Binding to RRMs and LCD 

Stress granules are enriched in RNA-binding proteins and 
translationally stalled mRNAs (Kedersha and Anderson, 2002). 
hnRNPAI contains two RNA recognition motifs (Figure 2A) that 
have been shown to bind RNA (Burd and Dreyfuss, 1994); thus, 
we tested whether the association with RNA plays a role in 
hnRNPAI phase separation properties. Based on CLIP-seq, 
hnRNPAI has been shown to bind >1 ,000 RNA species through 
a relatively short, degenerate sequence motif (Huelga et al., 
2012). Thus, rather than engineer a specific sequence, we used 
a random oligonucleotide RNA sequence. Fluorescently labeled 
RNA (fl-RNA 44 ) was recruited into the protein-dense droplets 
formed by hnRNPAI (Figure 4A). Notably, the addition of RNA 
substantially decreased the hnRNPAI concentration required 
for phase separation to as low as 500 nM, well within the esti- 
mated intracellular concentration of hnRNPAI (Figure 4B and 
Supplemental Information). The increased propensity for LLPS 
in the presence of RNA suggested the formation of larger 
hetero-oligomers. Indeed, despite our previous results that A1- 
RRM alone was not able to undergo LLPS under any conditions 
when tested in isolation, it readily phase separated in the pres- 
ence of RNA (Figure 4C). The two RRM domains and multiple 
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Figure 3. Molecular Crowding, Electrostatic and Hydrophobic Interactions, and Increased Cytoplasmic Concentration of hnRNPs Contribute 
to Liquid-Liquid Phase Separation of hnRNPAI 

(A) Phase diagrams of hnRNPAI in 50 mM HEPES, 300 mM NaCI, 5 mM DTT. The apparent cloud point, i.e., the temperature at which droplets were first 
observed, was determined as a function of protein concentration and molecular crowding. Each point represents the mean of a triplicate ± SD. The solid curve 
represents a fit to a relation for binary demixing that describes the shape of the coexistence curve (Muschol and Rosenberger, 1997; Sengers, 1980; Stanley, 
1971). 

(B) Protein/NaCI concentration pairs scoring positive (green circles) or negative (red diamonds) for the appearance of droplets. The experiment was performed in 
100 mg/ml Ficoll at 10°C. 

(C) DIC images of 100 jxM hnRNPAI and 150 mg/ml Ficoll at 10°C; the solution returns to the one-phase regime upon the addition of 5% 1 ,6-hexanediol. 

(D) Confocal microscopy images of HeLa cells transfected with YFP or YFP-M9M and immunostained with anti-hnRNPAI (red) and anti-el F4G (purple). The insets 
show hnRNPAI . See also Figure S5. 

(E) Quantification for data in (D). The percentage of transfected cells displaying SGs was plotted as mean ± SEM; n = 1 00 cells; **p < 0.005 by one-way ANOVA, 
Tukey’s post hoc test. 



binding motifs for RRMs on the RNA likely mediate weak multiva- 
lent interactions that lead to LLPS. Droplets formed by A1-LCD 
also recruited RNA (Figure 4C), indicating that the LCD of 
hnRNPAI binds RNA, as shown previously (Mayeda et al. , 
1994). Indeed, using fluorescence anisotropy, we confirmed 
that A1-FL, A1-RRM, and A1-LCD interacted with fl-RNA 44 with 
micromolar affinity (Figure 4D). In this experiment, we added 
increasing concentrations of the indicated proteins to fl-RNA 44 . 
Protein binding to RNA slows the tumbling of the labeled species, 
and this is detected by an increase in fluorescence anisotropy; the 
inflection point on the curve corresponds approximately to 
the dissociation constant of the interaction. Thus, RNA can bind 
the RRM domains, as well as the LCD in hnRNPAI , and this multi- 



valency likely results in the formation of large higher-order com- 
plexes that promote LLPS of hnRNPAI more efficiently than via 
the LCD alone. Our findings suggest a mechanism by which multi- 
valent interactions between RNA and some RNA-binding proteins 
may contribute to the formation of stress granules. 

Disease-Causing Mutation to hnRNPAI Does Not 
Significantly Alter Liquid-Liquid Phase Separation 
Properties 

The disease-causing hnRNPAI mutant, D262V, was associated 
with increased stress granule assembly, as well as formation of 
hnRNPAI fibrils, but the relationship of these observations was 
unclear (Kim et al., 2013). To examine the impact of disease 
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Figure 4. RNA Facilitates Liquid-Liquid 
Phase Separation of hnRNPAI by Binding 
to RRMs and LCD 

(A) DIC/Fluorescence images of 120 jxM hnRNPAI 
mixed with 1.2 ^iM fluorescein-labeled RNA at 
1 0°C. The samples of purified hnRNPAI were RNA 
free (Figure SI). 

(B) Phase diagram of hnRNPAI as a function of 
protein concentration and RNA concentration. Red 
and green symbols indicate that the sample was in 
the one-phase or the two-phase regime, respec- 
tively. The experiment was performed in 50 mM 
HEPES, 150 mM NaCI, 5 mM DTT, and 150 mg/ml 
Ficoll at 10°C. 

(C) Fluorescence images of 100 fiM fluorescein- 
labeled RNA mixed with 100 fiM A1-RRM or A1- 
LCD at 10°C. 

(D) A1-FL, A1-RRM, and A1-LCD binding to RNA 
was monitored by changes in fluorescence anisot- 
ropy of 5' -fluorescein-labeled RNA (fl-RNA 44 ). 
Symbols represent experimental data points, and 

solid lines are non-linear least-squares fits to a direct binding model (Roehrl et al., 2004). Importantly, LLPS did not occur under these conditions; the increase in 
fluorescence anisotropy is therefore caused by direct binding, not partitioning of RNA into droplets. 
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mutation on LLPS of hnRNPAI , we expressed and purified His- 
SUMO-hnRNPAI -D262V (A1-D262V) and determined that this 
mutant protein undergoes spontaneous temperature- and con- 
centration-dependent LLPS to create liquid droplets that were 
morphologically indistinguishable from liquid droplets formed 
by wild-type protein (Movie S6). Assessment of A1-D262V in 
liquid droplets by FRAP showed a recovery time of similar dura- 
tion (2.9 s) to that of wild-type hnRNPAI , illustrating that the mu- 
tation did not significantly impact dynamic exchange with the 
surrounding solution — at least in the time frames examined (Fig- 
ure 5A). We mapped the phase diagram of mutant hnRNPAI in 
the presence of molecular crowder (Ficoll at 100 mg/ml) and 
300 mM NaCI and found no significant differences from that of 
wild-type hnRNPAI (Figure 5B). Finally, we observed that wild- 
type and mutant hnRNPAI were miscible in droplets (Figure 5C). 
Together, these results suggest that the disease-causing muta- 
tion does not significantly impact the interactions that drive 
phase separation. 

Liquid-Liquid Phase Separation Promotes Fibrillization 
of the Disease-Causing Mutant 

While examining the LLPS properties of A1-D262V, we noted 
that reversible droplet assembly was accompanied by the accu- 
mulation of insoluble precipitate on the coverslip surface (Fig- 
ure 6A). After the temperature was raised and the droplets 
dispersed, the surface of the coverslip was found to be blan- 
keted with Thioflavin-T (ThT)-positive fibrils (Figure S6). This phe- 
nomenon occurred within minutes of droplet assembly by 
A1-D262V and was not observed with wild-type hnRNPAI or 
Al-Ahexa. We tested the fibrillization propensity of the His- 
SUMO-tagged proteins under agitation at 25°C by ThT fluores- 
cence and observed substantially greater fibrillization of the 
mutant compared to wild-type, whereas hnRNPAI -Ahexa did 
not fibrillize over a period of 24 hr (Figure 6B). Fibrillization by 
A1-D262V under these conditions occurred on a timescale of 
hours, consistent with previous results (Kim et al., 2013). To 



observe the effect of LLPS on fibrillization, we mixed purified 
wild-type hnRNPAI and mutant hnRNPAI under conditions 
that allowed LLPS and noted that fibrils of mutant hnRNPAI 
formed almost immediately in the floating condensed liquid 
droplets (Figure 6C) and gradually deposited on the coverslip 
surface (Figure 6D). Interestingly, we observed that mutant 
hnRNPAI fibrils eventually seeded the assembly of wild-type 
hnRNPAI , resulting in mixed fibrils on the coverslip surface (Fig- 
ure 6D), which is consistent with the previous observation that 
preformed fibrils of hnRNPAI can seed assembly (Kim et al., 
2013). To further illustrate the role of phase separation in driving 
fibrillization, we investigated the temporal correlation of the 
onset of fibrillization with LLPS. The protein was held in the 
one-phase regime (33°C) or held in the two-phase regime 
(16°C) by decreasing the temperature at different time points 
(0 min and 20 min). Mutant hnRNPAI always formed fibrils imme- 
diately upon LLPS at either early or late time points but never 
formed fibrils when the protein was maintained in the one-phase 
regime over the same period of time (Figure 6E). T aken together, 
our findings demonstrate that LLPS increases the propensity of 
hnRPA1-D262V to form amyloid-like fibrils, likely by increasing 
the local protein concentration in droplets and enhancing nucle- 
ation. The presence of wild-type protein in droplets, or presum- 
ably other RBPs at risk for aggregation in stress granules, such 
as TDP-43, leads to its co-recruitment into fibrils. 

DISCUSSION 

We showed that the RBPs hnRNPAI and TDP-43, disease- 
related proteins that are typical components of stress granules, 
are able to assemble into protein-rich droplets via LLPS (Fig- 
ure 1). The LCD of hnRNPAI mediates LLPS in vitro and is suffi- 
cient for recruitment into stress granules in cells (Figure 2). LLPS 
of hnRNPAI is tunable by environmental conditions; specifically, 
lower salt concentration, molecular crowding, and interaction 
with RNA all reduce the protein concentration required for 
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Figure 5. Disease-Causing Mutant Has 
Liquid-Liquid Phase Separation Properties 
Similar to the Wild-Type 

(A) FRAP of fluorescently labeled/unlabeled A1 
D262V at a molar ratio of 1 :300. The black curve is an 
average of FRAP events from nine distinct droplets; 
the error bars represent the SE. The red curve cor- 
responds to a double exponential fit of the data. 
The two characteristic recovery times are 2.86 s and 
23.2 s. See also Table SI . 

(B) Phase diagrams of wild-type hnRNPAI and 
hnRNPAI -D262V. The apparent cloud point, i.e., the 
temperature at which droplets were first observed, 
was determined as a function of protein concentra- 
tion. Each point represents the mean of a triplicate ± 
SD. The solid curve represents a fit to a relation for 
binary demixing from renormalization-group theory. 
WT data are replotted from Figure 3A. 

(C) Fluorescence images of Oregon-green-labeled/ 
unlabeled wild-type hnRNPAI mixed with Rhoda- 
min-Texas red labeled/unlabeled A1-D262V (both at 
molar ratios of 1 :300) at 10°C. 



LLPS to a physiologically relevant range (Figures 3 and 4). 
Forced increase in the cytoplasmic concentration of hnRNPAI 
and closely related RBPs is sufficient to drive stress granule as- 
sembly, which is consistent with an LLPS-mediated process 
(Figure 3). Fibrillization of hnRNPAI promoted by a potent steric 
zipper in the LCD is dispensable for LLPS and for the recruitment 
of the LCD to stress granules (Figure 2). Furthermore, protein 
molecules can exchange between the protein-poor and pro- 
tein-rich phases on a timescale of seconds, which is consistent 
with the dynamics observed in stress granules in cells (Figure 1). 
By contrast, LCD-containing RBPs are more rigidly incorporated 
into hydrogels composed of uniformly polymerized amyloid-like 
fibers (Kato et al., 2012; Figure 1). Nevertheless, it is evident that 
a propensity toward fibrillization is a conserved feature of LCD- 
containing RBPs, possibly due to the sequence features giving 
rise to LLPS and specific interactions with other binding part- 
ners. The fibrillization propensity may also be physiologically 
relevant to stress granule function as has been previously sug- 
gested (Kato et al., 2012), perhaps representing a process of 
maturation after assembly is initiated by LLPS. On the other 
hand, the propensity toward fibrillization within the condensed 
liquid environment of stress granules also poses a risk, particu- 
larly when the LCD contains a fibril-promoting mutation (such 
as the disease-causing D262V mutation in hnRNPAI), which 
can lead to excess, pathological fibrillization as observed in 
ALS, FTD, myopathy, and MSP and as predicted by Weber 
and Brangwynne (2012) (Figures 6 and 7). 

LCD Sequence Properties Promoting LLPS 

Proteins harboring LCDs are abundant in RNA granules and 
other membrane-less organelles (Anderson and Kedersha, 
2006; Buchan and Parker, 2009; Voronina et al., 2011). Our 
finding that the LCD of hnRNPAI mediates LLPS is consistent 



with recent reports that LCD-mediated LLPS of the DEAD-box 
helicases Ddx4 and Laf-1 play roles in the assembly of P gran- 
ules (Elbaum-Garfinkle et al., 2015; Nott et al., 2015). As with 
generic IDPs (Das and Pappu, 2013; Das et al., 2015; Muller- 

Spath et al., 2010), amino acid composition and sequence 
patterning likely determine interactions within and conforma- 
tional properties of LCDs and encode their ability to undergo 
phase separation. What might these features be? We find that 
LLPS of hnRNPAI is enthalpy driven and that aromatic and elec- 
trostatic interactions are driving forces (Figure 3). Indeed, 
hnRNPAI is enriched in the aromatic residues phenylalanine 
and tyrosine and the positively charged residue arginine relative 
to the overall eukaryotic proteome (Hormoz, 2013). Moreover, 
the LCD sequence of hnRNPAI is patterned; phenylalanine 
and tyrosine residues are relatively evenly distributed with a 
mean spacing of 6.2 ± 2.3 residues (Figure S7). Positively 
charged residues, mainly arginines, are also well distributed (Fig- 
ure S7). They may thus represent reiterated interaction motifs in 
the background of a polar polymer and enable multivalent inter- 
actions that drive LLPS. 

Multivalent RNA/RRM Domains Mediate Assembly 
of Large Complexes 

Stress granules are, of course, composed not only of RBPs 
but also untranslated mRNAs (Buchan and Parker, 2009; 
Teixeira et al., 2005), posing the question of what role RNA plays 
in mediating LLPS of RBPs. Upon LLPS of a mixture of hnRNPAI 
and RNA, we find that RNA not only localizes to the dense 
protein phase but reduces the hnRNPAI concentrations 
required for LLPS. RNA likely engages hnRNPAI in a multivalent 
fashion via both the RRM domains and the LCD, resulting in 
large, cross-linked complexes that undergo LLPS at reduced 
concentrations. This interpretation is supported by several 
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Figure 6. Phase Separation Promotes Fibrillization of hnRNPAI D262V 

All experiments were performed in 50 mM HEPES, 300 mM NaCI, 5 mM DTT, and 100 mg/ml Ficoll. 

(A) A1 -D262V fibril accumulation on the surface of the coverslip was monitored by cycling the temperature between 1 0°C and 25°C. Each cycle corresponded to a 
starting temperature of 25°C, subsequently decreased to 1 0°C to allow droplet formation and increased back to 25°C. The images were taken at 25°C in order to 
visualize the surface. See also Figure S6. 

(B) A1-FL, A1-D262V, or Al-Ahexa were agitated at 25°C for 24 hr. Fibrillization was monitored by ThT fluorescence. 

(C) Fluorescence images of floating droplets of a mixture of Oregon-green-labeled/unlabeled wild-type hnRNPAI (total concentration 160 [xM, molar ratio of 
1:300) mixed with Rhodamin-Texas red labeled/unlabeled A1 D262V (total concentration 160 |xM, molar ratio of 1:300) at 16°C. 

(D) Fluorescence images of a mixture of Oregon-green-labeled/unlabeled wild-type hnRNPAI (total concentration 160 [xM, molar ratio of 1:300) mixed with 
Rhodamin-Texas-red-labeled/unlabeled A1-D262V (total concentration 160 ^M, molar ratio of 1 :300) at 33°C. The images were taken at indicated times at the 
surface of the coverslips. 

(E) Schematic summarizing the experiment to correlate phase separation and fibrillization. The sample was either kept in the one-phase regime (33°C) for 35 min 
(red arrow), kept in the one-phase regime for 20 min and then put in the two-phase regime by decreasing the temperature to 1 6°C for 1 5 min (blue arrow), or kept in 
the two-phase regime for 15 min (yellow arrow). The images were taken at the indicated time points (a-e). 
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findings: (1) the folded RRM domains alone do not phase sepa- 
rate in the absence of RNA under any conditions tested; (2) RNA 
promotes LLPS of the RRM domains, presumably by engaging 
the two RRM domains in a multivalent fashion via several nucle- 
otide motifs, as was previously shown for PTB/RNA interactions 
(Li et al., 201 2), resulting in large, cross-linked complexes; and (3) 
the LCD also binds RNA. Thus, our results suggest that both pro- 
tein-protein and protein-RNA interactions work synergistically in 
the assembly of RNA granules. 

Fibrillization of RBPs in Stress Granules — Pathology 
or Physiological Function? 

hnRNPAI is intrinsically prone to fibrillization — a propensity that 
is strongly enhanced by mutations that cause ALS and MSP (Kim 
et al., 2013). Although fibrillization is not required for LLPS, and 
disease mutation does not significantly impact LLPS behavior 
on short timescales (Figure 5), fibrillization of mutant hnRNPAI 
is enhanced in the two-phase regime (Figure 6). We propose 
that the high local protein concentration in the condensed liquid 
droplets increases the probability of rare nucleation events and 
of the rate constant for adding monomers to a growing fibril, 
which is in agreement with typical nucleation-driven fibrillization 
processes (Eisenberg and Jucker, 2012). Changes in the confor- 
mational ensemble and dynamics of the LCD compared to the 
dispersed state may also play roles in enhanced nucleation 
(Pappu et al., 2008). Nevertheless, the fact that wild-type 
hnRNPAI is prone to fibrillization on longer timescales, a 
behavior shared by a large number of related RBPs (Ramaswami 
et al., 2013), suggests that this property could occur in the 
context of RNA granules in cells and potentially contribute 
to physiological functions. For example, short length-scale 
dynamic fibrils might assemble in the condensed liquid phase 
during maturation of long-lived RNA granules and play functional 
roles or provide mechanical stability. 

Irrespective of a potential physiological role in normal stress 
granules, our findings provide a model to explain the relationship 
between disease-associated genetic mutations that promote 



stress granule formation or prolong their 
assembly (Figure 7) and the fibrillar pa- 
thology that dominates end-stage dis- 
ease. Thus, we propose that the 
condensed liquid state of stress granules 
resulting from LLPS of hnRNPAI and 
related RBPs presents a low probability 
risk of assembling amyloid-like fibrils 
that, under normal conditions, can be 
managed by granule disassembly and 
cellular proteostasis machinery. However, 
when stress granules are composed of 
RBPs that contain LCD mutations that 
promote fibrillization (Kim et al., 2013; 
Klar et al., 2013; Kwiatkowski et al., 2009; Sreedharan et al., 
2008; Vieira et al., 2014), or when stress granules persist due 
to disturbances in disassembly machinery (Buchan et al., 2013; 
Figley et al., 2014), pathogenic fibrils can assemble and escape 
quality control surveillance. 

EXPERIMENTAL PROCEDURES 

Protein Expression and Purification 

A1-FL, A1-RRM, A1-LCD, A1-D262V, and Al-Ahexa were expressed as His- 
SUMO-tagged fusion proteins in BL21 -Gold (DE3) cells (Agilent) in LB medium. 
Cells were lysed in 50 mM HEPES (pH 7.5), 250 mM NaCI, 30 mM imidazole, 
2 mM (3ME, 100 |ig/ml RNase, and complete protease inhibitor cocktail 
(Roche) with a microfluidizer. The cleared lysate was loaded onto a gravity 
NiNTA column, washed with lysis buffer, and eluted in 50 mM HEPES (pH 
7.5), 300 mM NaCI, 300 mM imidazole, and 2 mM (3ME. The proteins were 
treated with 0.2 mg/ml RNase A (Roche) for 5 min at 37° C. The proteins 
were purified by ion exchange chromatography with a HiTrap SP or Q column 
(GE Healthcare). The fractions were analyzed by SDS-PAGE gel, pooled, and 
concentrated. They were then subjected to size exclusion chromatography on 
a Superdex 200 16/60 column (GE Healthcare) equilibrated in sample buffer, 
50 mM HEPES (pH 7.5), 300 mM NaCI, and 5 mM DTT. The fractions were 
analyzed by SDS-PAGE gel, pooled, concentrated, and stored at -80°C. Dy- 
namic light scattering was used to ensure that the proteins were monomeric. 
The RNA levels were analyzed by polyacrylamide gel (Figure SI). hnRNPAI 
WT was fluorescently labeled with Oregon green 488, hnRNPAI D262V with 
Rhodamine Red-X (for details, see Supplemental Experimental Procedures). 

Formation of hnRNPAI Hydrogels 

Purified A1-FL was dialyzed against a gelation buffer containing 50 mM Tris- 
HCI (pH 7.5), 150 mM NaCI, 1 mM TCEP, 0.1 mM EDTA, and 1 mM benzami- 
dine overnight at 4°C. The protein solution was sonicated 1 0 s at a 1 3% power 
level on a Misonix ultrasonic liquid processor Model S4000. The protein solu- 
tions were concentrated to roughly 35 mg/ml. After centrifugation, a 0.5 plI 
droplet of the supernatant was deposited onto a glass-bottomed microscope 
dish (MatTek). The dish was sealed with parafilm and incubated for 2 days at 
RT. The method was adapted from a previous report (Kato et al., 2012). 

Cell Culture and Transfection 

HeLa and U20S G3BP-GFP cells were cultured in DMEM (Hyclone) supple- 
mented with 10% FBS (Hyclone) and GlutaMax-IX (GIBCO). The U20S 
G3BP-GFP stable line was a gift from Paul Anderson. Cells were transfected 
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using FuGENE6 transfection reagent (Roche), according to the manufacturer’s 
instructions. 



FRAP Methods and Analysis 

FRAP experiments were performed using a Marianas spinning disk confocal 
(SDC) imaging system on a Zeiss Axio Observer inverted microscope platform. 
Time-lapse images of the sample were collected with 1 00 ms exposure time for 
1 to 2 min at 5.6 frames per second using a Zeiss Plan-Apochromat 63 x 1 .4 NA 
oil objective and Evolve 512 EMCCD camera (Photometries). Images were 
analyzed with SlideBook 6 software (3i). For FRAP analysis, mean fluorescence 
intensities from three regions of interests (ROIs) of time-lapse images were 
computed. ROI-1 was the photobleached region/droplet, and ROI-2 was drawn 
in the area/droplet not connected to the photobleached droplet and was used to 
correct for overall photobleaching due to imaging laser illumination. ROI-3 was 
defined as background, and its signal was subtracted from both ROI-1 and 
ROI-2 signals. Such background and photobleaching corrected fluorescence 
intensity versus time graphs were expressed in fractional form normalized by 
the pre-photobleach intensity (Axelrod et al., 1976) and fitted to equations for 
single- or double-exponential recovery. See also Table SI . 

Immunofluorescence Studies 

Cells were fixed in 4% paraformaldehyde in phosphate-buffered saline (PBS), 
permeabilized with 0.5% Triton X-1 00 in PBS for 1 0 min, blocked with 5% goat 
serum in PBS for 45 min, and incubated with primary antibody for 1 .5 hr at RT. 
Primary antibodies were visualized with secondary antibodies conjugated with 
Alexa Fluor 488, Alexa Fluor 555, and Alexa Fluor 647 (Molecular Probes, Invi- 
trogen), and nuclei were detected using DAPI. Stained cells were examined us- 
ing a Zeiss LSM 780 NLO confocal microscope with Zeiss ZEN software. 



In Vitro Determination of Phase Diagram 

Samples were prepared by mixing the determined amount of protein, buffer, 
and Ficoll PM 400 (Sigma). Apparent cloud points were measured using a 
Linkam PEI 00 thermal stage mounted on a Zeiss LSM 780 NLO microscope. 
Sealed sample chambers containing protein solutions comprised coverslips 
sandwiching two layers of 3M 300 LSE high-temperature double-sided tape 
(0.34 mm) and were taped on the PEI 00 silver heating/cooling block. The vari- 
ance in the solution conditions was monitored with temperature. For each 
given hnRNPAI concentration, the sample was equilibrated at 33°C (one- 
phase regime). The temperature was then decreased at a rate of 2°C/min until 
the initial appearance of droplets at the apparent cloud point. Each set of cloud 
points (three independent replicates) was fitted to the scaling relation for binary 
demixing from renormalization-group theory (Muschol and Rosenberger, 
1997; Sengers, 1980; Stanley, 1971): 



T = Tc 



{’ 



I Cc-Cp 
| Cc 



|,with a critical exponent (3 = 0.325. 



Fluorescence Anisotropy 

The N-terminally fluorescently labeled RNA fl-RNA 44 with sequence GGGC 
CCCCGGGUACCGAGCUGCUAAUCAAAACAAAACAAAAGCU was purchased 
from Sigma. For direct FA binding assays, increasing concentrations of A1-FL, 
A1-RRM and A1-LCD were titrated into 40 nM fl-RNA 44 in a buffer containing 
50 mM HEPES (pH 7.5), 300 mM NaCI, 0.01 % Triton, and 5 mM DTT, and the 
FA was monitored with a CLARIOstar plate reader (BMG Labtech) at 25°C. We 
performed a standard fluorescence anisotropy binding experiment, in which 
we added increasing concentrations of the indicated proteins to fluorescein- 
labeled RNA. Protein binding to RNA slows the tumbling of this labeled spe- 
cies, and this is detected by an increase in fluorescence anisotropy. Analysis 
was performed as described previously (Roehrl et al., 2004). 

In Vitro Fibril Formation Assay with Thioflavin-T Measurements 

The experiment was performed as described previously (Kim et al., 2013). In 
brief, 1 00 jxM Al -WT, Al -D262V, and Al -Ahexa, respectively, were incubated 
at 25 °C under agitation for 24 hr. Aliquots were removed at 0 and 24 hr and 
added to a solution of 50 fiM ThT, and the fluorescence intensity at excita- 
tion/emission wavelengths of 450/550 nm, respectively, was determined. 



SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, two tables, and six movies and can be found with this article on- 
line at http://dx.doi.Org/10.1016/j.cell.2015.09.015. 
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SUMMARY 

Mammalian interphase chromosomes interact with 
the nuclear lamina (NL) through hundreds of large 
lamina-associated domains (LADs). We report a 
method to map NL contacts genome-wide in single 
human cells. Analysis of nearly 400 maps reveals a 
core architecture consisting of gene-poor LADs 
that contact the NL with high cell-to-cell consistency, 
interspersed by LADs with more variable NL interac- 
tions. The variable contacts tend to be cell-type spe- 
cific and are more sensitive to changes in genome 
ploidy than the consistent contacts. Single-cell 
maps indicate that NL contacts involve multivalent 
interactions over hundreds of kilobases. Moreover, 
we observe extensive intra-chromosomal coordina- 
tion of NL contacts, even over tens of megabases. 
Such coordinated loci exhibit preferential interac- 
tions as detected by Hi-C. Finally, the consistency 
of NL contacts is inversely linked to gene activity in 
single cells and correlates positively with the hetero- 
chromatic histone modification H3K9me3. These re- 
sults highlight fundamental principles of single-cell 
chromatin organization. 



INTRODUCTION 

An important unresolved question in eukaryotic genome biology 
is how chromosomes are spatially organized inside interphase 
nuclei. Current evidence suggests that this organization is driven 
by probabilistic principles (Bickmore, 2013; Cavalli and Misteli, 
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2013; Gibcus and Dekker, 2013). Systematic fluorescence 
in situ hybridization (FISH) experiments have revealed that in a 
homogeneous cell population the nuclear positions of chromo- 
somes are variable with respect to each other and relative to 
the periphery (Bolzer et al., 2005). However, this positioning is 
not entirely random; for example, in human lymphoid cells, chro- 
mosome 1 8 (chrl 8) tends to be located near the periphery, while 
chr19 shows a preference for the nuclear interior (Croft et al., 
1999; Cremer et al., 2001). 

At a smaller scale, certain individual genomic loci visualized 
by FISH also exhibit preferences for specific nuclear landmarks, 
such as the nuclear envelope (Marshall et al., 1996; Kosak et al., 
2002) and nucleoli (Manuelidis and Borden, 1988; Ochs and 
Press, 1992), but usually with some degree of random variation. 
This variability is directly illustrated by in vivo tagging experi- 
ments in which loci contacting the nuclear lamina (NL) were 
tracked over mitosis in a clonal human cell line (Kind et al., 
2013). This demonstrated that a sizeable subset of loci that 
were associated with the NL in mother cells relocated to the nu- 
clear interior in daughter cells, indicating that, at least to some 
degree, genome contacts with the NL are intrinsically variable. 

Complementary to these single-cell microscopy approaches 
are genome-wide mapping techniques that query the chromo- 
some organization in large pools of cells (van Steensel and Dek- 
ker, 2010). For example, the 4C, 5C, and Hi-C technologies 
generate maps of the pair-wise spatial proximity of genomic 
loci (de Wit and de Laat, 2012; Dekker et al., 2013). Such maps 
have revealed global patterns that indicate that mammalian 
interphase chromosomes are partitioned into domains that 
are roughly 200 kb-2 Mb in size (Lieberman-Aiden et al., 2009; 
Dixon et al., 2012; Nora et al., 2012; Rao et al., 2014). Com- 
putational models of chromosome polymer folding fitted to 
5C and Hi-C data generally suggest that interphase chromo- 
somes adopt multiple configurations that vary from cell to cell 
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Figure 1. Single-Cell DamID Mapping of 
Genome-NL Interactions 

(A) Schematic representation of the single-cell 
DamID procedure. HI, heat inactivation step. 

(B) NL contact maps for chrl 7 in six individual cells 
(black) and the average profile of 118 single cells 
(bottom track, green). OE, observed over expected 
read count ratio in contiguous 100-kb segments. 

(C) Grayscale representation of OE values for 1 18 
single cells on chromosome 17. 

The gray bars underneath the axes in (B) and (C) 
mark unmappable regions; c, centromeric region. 
See also Figure SI . 
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(Lieberman-Aiden et al., 2009; Bau et al., 2011; Kalhor et al., 
2012; Giorgetti et al., 2014; Nagano et al., 2013). 

Another genome-wide approach to studying chromosome 
architecture is the mapping of interactions with the NL, mostly 
by means of the DamID technology (Guelen et al., 2008; Peric- 
Hupkes et al., 2010; Meuleman et al., 2013). The NL provides a 
very large surface area for potential contacts with the genome. 
Indeed, DamID studies have estimated that as much as ~35% 
of the mammalian genome can interact with the NL in any tested 
cell type, although it has remained unclear how much of the 
genome contacts the NL in a single cell. Genome-NL interactions 
occur through about 1,100-1,400 discrete lamina-associated 
domains (LADs), which have a median size of ~0.5 Mb and are 
scattered across all chromosomes. Most genes in LADs are ex- 
pressed at very low levels, and results of various tethering exper- 
iments (Finlan et al., 2008; Reddy et al., 2008; Kind et al., 2013; 
Therizols et al., 2014) point to a reciprocal relationship between 
gene positioning at the NL and a repressive chromatin state. 

The large size of LADs and their prevalence throughout the 
genome strongly suggest that LAD-NL interactions play an 
important role in interphase chromosome architecture. Insights 
into the single-cell behavior of these interactions will thus 
enhance our fundamental understanding of chromosome orga- 
nization. Here, we report a modified version of the DamID tech- 
nology that is sensitive enough to generate genome-wide maps 
of NL contacts in single human cells at a resolution of ~100 kb, 
which is well below the median size of LADs. We generated a 
total of 395 of these single-cell maps. These maps, comple- 
mented by Hi-C analysis and super-resolution microscopy imag- 



ing, provide insights into the nature of 
LAD-NL interactions and uncover princi- 
ples of cell-to-cell variation in chromo- 
some architecture. 

RESULTS 

Single-Cell DamID Methodology 

As a model system to develop single-cell 
DamID, we chose the human myeloid leu- 
kemia cell line KBM7, which is haploid for 
all chromosomes, except for chr8 and a 
small part of chrl 5 (Kotecki et al., 1999; 
Burckstummer et al., 2013). Even though 
this haploid state is unusual for human 
somatic cells, it facilitates the interpretation of single-cell 
genome-wide maps because there is no need to discriminate 
the homologous chromosomes that are present in diploid cells. 

We developed a DamID protocol for single cells as summa- 
rized in Figure 1A. We created a KBM7 clone (#14) that ex- 
presses an inducible fusion protein consisting of DNA adenine 
methyltransferase (Dam) and Lamin B1 (LmnBI) (Kind et al., 
201 3), as well as the Fucci two-color fluorescent reporter system 
to monitor the cell-cycle stage (Sakaue-Sawano et al., 2008). We 
then induced Dam-LmnBI protein expression (Figures SI A and 
SI B), and 1 5 hr later we collected single cells at the onset of the 
S phase by fluorescence-activated cell sorting (FACS) (Fig- 
ure SIC). This design ensured that the harvested cells had ex- 
pressed Dam-LmnBI for most of their recent G1 phase (which 
lasts on average ~14.9 hr, an estimate based on doubling time 
and FACS data), providing sufficient time for the accumulation 
of adenine methylation on the genomic loci that contact the 
NL. Importantly, LmnBI was strictly confined to the nuclear pe- 
riphery before and after Dam-LmnBI induction (Figure SID). 

The single FACS-sorted cells were captured directly in a small 
volume of lysis buffer in a 96-well plate. Subsequent sample pro- 
cessing consisted of only a few steps (Figure 1 A): digestion with 
Dpnl, which is highly specific for Dam-methylated GATC se- 
quences, followed by adaptor ligation (Figure S1E) and a total 
of 26 cycles of PCR amplification. A key difference with the con- 
ventional DamID protocol is that these steps are all done in the 
same well by sequential addition of reagents, without intermedi- 
ate purification or concentration steps that could lead to loss of 
DNA. Gel electrophoresis showed that approximately half of the 
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wells yielded a clear smear of amplified DNA (Figure S1F). 
These PCR products were prepared for multiplexed lllumina 
sequencing by ligation of indexed adaptors, subsequently 
pooled, and sequenced. 

Single-Cell NL Interaction Maps 

In total, we generated 118 single-cell DamID maps from KBM7 
clone #1 4. After applying quality filters (Figure SI G), we obtained 
a median of 5 x 10 5 reads per cell that could be mapped to a 
unique genomic location (Figure S1H). On average, 92% of the 
mapped locations started with a GATC motif, which is the recog- 
nition sequence of both Dam and Dpnl, indicating that the detec- 
tion is highly specific. For each cell, we binned these reads in 
100-kb contiguous genomic segments, and then we calculated 
for each segment an observed over expected (OE) score based 
on the number of recovered unique reads, the theoretical 
maximum number of mappable unique reads in the segment, 
and the total genome-wide read count obtained from each cell. 
OE scores >1 indicate more Dam-LmnBI methylation than 
may be expected by random chance. 

Essentially, all cells showed a striking domain pattern of OE 
scores along most chromosomes, as illustrated for chr17 in Fig- 
ures IB and 1C. This pattern is reminiscent of the LAD profiles 
previously published for populations of cells. However, clear dif- 
ferences can be observed between individual cells (Figure IB), 
often with entire megabase-sized domains missing, which is 
suggestive of cell-to-cell variation in NL contacts. 

We note that the chosen segment size of 1 00 kb is a compro- 
mise between resolution and noise. Because the median human 
LAD size is approximately 0.5 Mb (Guelen et al., 2008), segments 
of 1 00 kb are expected to capture most of the LAD organization. 
Indeed, the OE scores in adjacent segments within each of the 
118 single cell samples have a Pearson correlation coefficient 
of 0.70 ± 0.06 (mean ± SD), indicating that neighboring segments 
report NL interactions in a partially redundant fashion with 
acceptable noise levels. For reference, at 100 kb resolution our 
previously published Dam-LmnBI profiles from pools of human 
Tig3, hESCs, and HT1080 cells (Guelen et al., 2008; Kind et al., 
2013; Meuleman et al., 2013) show a correlation of 0.88-0.90 
between neighboring segments. The single-cell correlation of 
OE scores between adjacent segments is not related to the 
number of reads per cell (Pearson’s correlation coefficient of 
0.06, p = 0.51), indicating that the latter does not impose a limit 
to the quality of our data at 100 kb resolution. 

Validation of Single-Cell Maps 

To further gauge the overall quality of these data, we first recon- 
structed a population profile by averaging the maps of the 118 
single cells and then compared it to a conventional microarray- 
based DamID profile generated from a large pool of KBM7 cells 
(Figure 2A). The highly similar domain patterns and an overall 
Spearman rank correlation coefficient of 0.90 demonstrate that 
the new protocol captures the same regions of interaction as 
the previous well-validated protocol. 

However, in conventional DamID, a Dam-only control is typi- 
cally included to normalize for chromatin accessibility and other 
potential biases (Vogel et al., 2007; Guelen et al., 2008). This 
normalization is not possible in single-cell DamID because the 



Dam-LmnBI and Dam-only profiles cannot be obtained from 
the same cell. Nevertheless, we established a Dam-only and 
Fucci-expressing KBM7 clone and mapped the adenine methyl- 
ation patterns in 26 single cells. The resulting patterns are very 
different from the Dam-LmnBI profiles (Figures 2B and 2C). In 
general, regions that have no detectable Dam-LmnBI signal 
show clear Dam-only signals; hence, these regions are not 
intrinsically undetectable. Conversely, regions with very high 
Dam-only signals generally do not show a Dam-LmnBI signal, 
indicating that there is no strong bias for accessibility in our 
Dam-LmnBI maps. We conclude that leaving out the Dam- 
only normalization is acceptable for single-cell Dam-LmnBI 
profiling. 

Next, we performed multi-color fluorescent in situ hybridiza- 
tion (FISH) with probes for six genomic loci covering a broad 
range of average OE scores (Figures 2D-2F). Analysis of hun- 
dreds of nuclei revealed a good correspondence between the 
average distance to the periphery according to FISH and the 
average OE scores (Spearman’s p = 0.94), confirming that our 
single-cell Dam-LmnBI profiles provide a view of the spatial 
organization of the genome relative to the NL. Together these 
data indicate that single-cell DamID using Dam-LmnBI gener- 
ates NL interaction profiles with low noise levels, suitable 
resolution, acceptable bias, and with good correspondence to 
localization by FISH. 

Cell-to-Cell Variability and Consistency of Genome-NL 
Associations 

Visual inspection of the collection of single-cell maps suggested 
that some regions interact more frequently with the NL than 
others (Figures IB and 1C). In order to analyze this systemati- 
cally, we first converted the OE scores for each cell to a binary 
NL contact map. For this, we used an OE score cutoff of 1 , moti- 
vated by the bimodal distribution of OE scores (Figure S2A) that 
suggests that loci are either in a “contact” or “no-contact” state. 
We then calculated for each 1 00-kb segment the NL contact fre- 
quency (OF), defined as the proportion of cells in which this 
segment contacted the NL. This data processing does not lead 
to a substantial loss of information content, because the 
Spearman rank correlation coefficient of the average radial posi- 
tion of the six FISH probes with CF is 0.90 (as compared to 0.94 
for OE values, see above), and CF values correlate strongly with 
the average OE scores (Figure S2B). CF values are also highly 
robust with respect to sequencing read depth, because subsam- 
pling of the single-cell data to the one-but-lowest read count 
(1.04 x 10 5 reads; an average 3.1 -fold downsampling) does 
not affect CF values (Figure S2C). 

Strikingly, CFs vary widely across the genome (Figures 3A and 
3B). About 23% of the 100-kb segments show no detectable 
contact with the NL in any of the 1 1 8 cells and thus are very sta- 
bly located in the nuclear interior. Conversely, ~15% of the seg- 
ments have CFs >80%, representing loci that are consistently 
located at the NL. About 34% of the segments have contact fre- 
quencies in the range of 20%-80% and thus show high cell-to- 
cell variability in their NL associations. The remaining loci 
(29%) only occasionally contact the NL (0 < CF < 20%). 

These different classes of loci are scattered throughout the 
genome (Figure 3A), although the smaller chromosomes tend 
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Figure 2. Validation of Single-Cell DamID Maps 

(A) Comparison NL contact map for chrl representing the average of 118 single-cell profiles (top profile) and a conventional DamID map generated with a 
population of ~1- 2 x 10 5 cells (bottom profile). The genome-wide correlation between the two methods is 0.90 (Spearman’s p.) 

(B) Average OE score of 1 18 single-cell Dam-LmnBI samples (y axis) versus the average OE score of 26 single-cell Dam-only samples (x axis). 

(C) Comparison of average OE scores obtained with Dam-LmnBI (top track, average of 1 18 cells, same as in Figure 1 B) and Dam-only (bottom track, average of 
26 cells). OE scores for the individual Dam-only cells are shown as grayscale-encoded rows in the center frame. Gray bars underneath the bottom axis mark 
unmappable regions; c, centromeric region. 

(legend continued on next page) 
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Figure 3. NL Contact Frequencies Are Linked to Developmental Dynamics, Gene Density, and Ploidy 

(A) Estimated contact frequency maps for all chromosomes in clone #1 4 cells. KBM7 cells carry a balanced translocation between chr9 and chr22 (Burckstummer 
et al. , 2013); vertical dotted lines mark the junctions. Centromeric regions are indicated by gray bars; telomeres are marked by black triangles. chr8 is not shown 
because it is diploid. 

(B) Cumulative histogram of genome-wide CF values. 

(C) Distribution of genomic segments with indicated CFs over constitutive (c) and facultative (f) LADs and inter-LADs (iLADs). 

(D) Average number of transcription start sites per 100-kb segment, plotted as a function of CF. 

(E) Comparison of CFs in diploid cells and pseudo-diploid cells. The latter are simulated by combining equal numbers of sequence reads from pairs of haploid cells. 
See also Figure S2 and Table SI . 



to have a lower density of stable N L contacts than larger chromo- 
somes. An exception to this rule is chrl 8, which harbors many 
regions with high CFs. This contrasts in particular with chrl 9, 
which only exhibits a few contact sites and very low CFs. This 
matches previous chromosome painting studies that found 
chrl 8 to be preferentially located at the nuclear periphery and 
chrl 9 in the nuclear interior (Croft et al., 1999; Cremer et al., 



2001). An intriguing pattern is visible on chrX (Figure 3A, bottom 
left). The distal arms of this chromosome have many stable NL 
contacts, while the centromere-proximal —40 Mb show only var- 
iable contacts. 

In order to confirm these CF patterns, we used an indepen- 
dently derived KBM7 clone that also expresses Dam-LmnBI 
and the Fucci system (clone #5.5) to generate a total of 168 



(D-F) Multi-color 3D DNA FISH microscopy with probes for six genomic loci covering a broad range of average OE scores distributed on chrl (n = 677) (D) and 
chrl 7 (n = 973) (E). Graphs depict the distributions of radial probe positions, with zero corresponding to the nuclear edge and one to the centroid. Three 
representative nuclei with three-color FISH signals are displayed below the graphs; DNA staining with DAPI is shown in gray. (F) Mean radial positions of the six 
probes versus the mean DamID OE scores. Numbers 1-6 correspond to probe numbers in (D) and (E). The dotted line shows linear regression fit. 
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single-cell maps. The genome-wide CF profile of clone #5.5 is 
highly similar to that of clone #14 (Pearson correlation, r = 
0.97; Figures S2D and S2E). Flowever, the absolute CF 
values in clone #5.5 are systematically 1 .3-fold lower than in 
clone #14, which we attribute to a somewhat lower activity of 
the Dam-LmnBI protein in clone #5.5 (Figure S2F). Neverthe- 
less, the relative CF differences between loci are highly consis- 
tent between the two clones. We cannot rule out that the CF 
values of clone #14 are underestimated, but the fact that the 
highest CF values of clone #14 are in the range of ~95%, com- 
bined with the tight linearity with clone #5.5, suggests that any 
underestimate in clone #14 is minor. 

Stable and Variable NL Contacts Are Linked to Degrees 
of Developmental Plasticity 

Previously, we reported that some LADs interact with the NL in a 
cell-type-specific (facultative) fashion, while other LADs do so in 
a cell-type invariant (constitutive) manner (Peric-Hupkes et al., 
2010; Meuleman et al., 2013). We investigated whether this 
distinct behavior is linked to differences in CF. We used a collec- 
tion of conventional microarray-based DamID maps of NL inter- 
actions in nine human cell lines of diverse origin (C.A.d.G. and 
B.v.S., unpublished data; see the Supplemental Experimental 
Procedures) to classify each 100-kb segment as constitutive 
LAD (cLAD), constitutive inter-LAD (ciLAD), facultative LAD 
(fLAD), or facultative inter-LAD (fiLAD). The latter are inter-LAD 
regions that are not associated with the NL in KBM7 cells, but 
do interact with the NL in at least one other cell type. 

As expected, ciLADs have low CF values (Figure 3C), consis- 
tent with their definition as not associated with the NL. Likewise, 
cLADs tend to have high CF values. Thus, these constitutive re- 
gions are not only invariable between different cell types but tend 
to be consistently positioned relative to the NL within one cell 
type. In contrast, facultative LADs and iLADs have mostly inter- 
mediate CF values: fLADs generally have lower CF values than 
cLADs (p < 2.2 x 10- 16 , Wilcoxon test) and fiLADs have higher 
values than ciLADs (p < 2.2 x 10 -16 ). The partial overlap in CF 
distributions of fLADs and fiLADs indicates that the definitions 
of these LAD classes are not perfect, which may be due to differ- 
ences in the data types (single-cell versus population-based 
DamID maps; sequencing versus microarray) and resolution. 
These results uncover a link between CF and the consistency 
of NL interactions between different cell types. 

We previously reported that cLADs have a ^2-fold lower gene 
density than ciLADs (Meuleman et al., 201 3). CF values are more 
dramatically linked to gene density: regions with CF >80% have 
a ^20-fold lower gene density than regions with no NL contacts 
(Figure 3D). The few genes that are located in such high-CF re- 
gions are enriched for gene ontology categories very divergent 
from myeloid cell functions, among them most prominently the 
olfactory receptor genes (Table SI), which are rarely expressed. 

A picture thus emerges in which most cLADs are relatively 
consistently associated with the NL, providing a structural back- 
bone to chromosomes that is largely invariant between individual 
cells and also between cell types. In contrast, regions with cell- 
type-specific NL interactions generally interact less consistently 
with the NL, contributing to cell-to-cell variability in the spatial 
organization of chromosomes. 



Ploidy of KBM7 Cells Primarily Affects Variable NL 
Contacts 

We considered the possibility that competition between LADs 
could explain why some LADs contact the NL in only a fraction 
of all cells: such LADs may have lower NL binding affinities and 
fail to compete with stronger LADs in some of the cells. We 
wondered whether this balance could be altered by changing 
the total amount of genomic DNA in the nucleus. To test this, 
we took advantage of the fact that KBM7 cells spontaneously 
form diploid cells at low frequency (Kotecki et al., 1999). Such 
diploid cells should be genetically identical to the haploid cells, 
except for their ploidy. We derived a clonal diploid line from 
clone #14, with normal cell-cycle behavior (Figure S2G), and 
generated a total of 51 single-cell Dam-LmnBI contact maps. 

Comparison of the single-cell maps from haploid and diploid 
cells is not straightforward, because our current DamID method 
cannot determine whether the sequence reads from diploid cells 
are derived from one homolog or both. We therefore constructed 
“pseudo-diploid” reference maps by combining equal numbers 
of sequence reads from pairs of single haploid cells and then 
merged the reads and processed the data as above, as if they 
were derived from a single cell. Hence, these pseudo-diploid 
maps are simulated maps in which two homologs of each chro- 
mosome are present, but there cannot be any biological effect of 
increased ploidy (because the homologs were in different cells), 
while technical skews due to the inability to discriminate the two 
homologs should be identical to those in diploid cells. 

Although the overall pattern of NL contacts was similar be- 
tween pseudo-diploid and diploid cells, many loci showed 
decreased CFs in diploid cells compared to haploid cells (Fig- 
ure 3E), while increased CFs were rare. Strikingly, these changes 
occurred preferentially in genomic segments that have interme- 
diate CFs in pseudo-diploid (and thus haploid) cells. This result 
indicates that diploidization leads to preferential loss of NL inter- 
actions of LADs that contact the NL less robustly in haploid cells. 
We obtained similar results with 32 single cells from a diploid 
clonal line derived from clone #5.5 (Figure S2H). The loss of NL 
contacts in diploid cells suggests increased competition for NL 
contacts. This may be caused by a reduced surface-to-volume 
ratio and an increase in the average distance of loci to the NL, 
both of which are expected to accompany the increase in nu- 
clear volume due to the doubled DNA content. However, other 
mechanisms cannot be ruled out. 

Single-Cell Maps Point to a Multivalent Mechanism of 
LAD-NL Interactions 

We wondered which biophysical principle could explain the dif- 
ferences in apparent NL affinity between genomic loci. The 
domain pattern of NL contacts suggests that these interactions 
are not mediated by focal attachments, but rather by multivalent 
interactions within each LAD. Because interactions with higher 
valency typically have a higher avidity, we predicted that long 
LADs interact more stably with the NL than short LADs. To 
analyze this, we defined LADs as continuous stretches of 
1 00-kb segments with CF >1 % across the 1 1 8 cells. This yielded 
a total of 1 ,358 LADs. Strikingly, the mean CF within each LAD 
shows a clear positive correlation with LAD length (Spearman’s 
p = 0.81; p < 2.2 x IQ" 16 ), reaching a plateau for sizes larger 
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Figure 4. Evidence for Multivalent NL Interactions 

(A) Correlation between the mean CF and the length of LADs. Here, LADs are defined as continuous regions in which all 1 00-kb segments have CF >1 % across all 
118 clone #14 cells. The curve shows running mean with bin size 15. 

(B) Binarized NL contact map of the p arm of chr7 in 1 18 cells (top), and the same data after a random shuffling procedure that keeps CFs and the number of 
contacts per cell the same, to simulate the complete absence of coordination between neighboring segments (bottom). The gray bar marked “c” indicates the 
centromeric region. Right-hand panels are magnified views of the regions outlined by red boxes in the left-hand panels. 

(C) Distribution of genome-wide NL contact run lengths in 118 single-cell datasets (blue) compared to 100 sets of randomized data (gray). 

(D) Genome-wide length distribution of no-contact runs in real (blue) and 100 sets of randomized (gray) data. No-contact runs present in all 1 18 cells (i.e., regions 
that never contact the NL) were excluded in this analysis. 

See also Figure S3. 



than ~6-8 Mb (Figure 4A). This supports a model of multivalent 
local genome-NL interactions. 

This multivalency model predicts that there are continuous 
stretches of contact in individual LADs in single cells. Indeed, 
such runs of contact could be frequently observed (black horizon- 
tal streaks in Figure 4B, top). For comparison, we randomly shuf- 
fled the contact data to simulate a “random button” model, in 
which each segment maintains exactly the same CF as in the 
real data, but contacts the NL independently of its neighboring 
segments (Figure 4B, bottom). This pattern has a much more 
fine-grained appearance than the original contact maps, with 
fewer long runs of contact. Quantitative analysis showed that 
long contact runs, particularly above ~1.5 Mb, occur more 



frequently in the real data than in the simulated random button 
model (Figure 4C). This supports the multivalent interaction model. 

Long-Range Coordination of NL Contacts within 
Chromosomes 

Interestingly, we also observed long runs of no-contact on many 
single chromosomes (white horizontal streaks in Figure 4B, top). 
Particularly in the range of >5 Mb, such runs occur more 
frequently than may be expected by random chance (Figure 4D). 
We interpret these long runs of no-contact as large chromo- 
somal regions (often including multiple LADs) that completely 
dissociate from the NL in an incidental manner. These long no- 
contact runs are not generally due to loss of large chromosomal 
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regions during sample processing, because our Dam-only single 
cell maps have more homogeneous methylation patterns (Fig- 
ure 2C) and lack long runs that are completely devoid of mappa- 
ble sequence reads (Figure S3A). 

This coordinated detachment of neighboring LADs from the NL 
prompted us to systematically search for evidence of coordination 
of NL contacts, by calculating pair-wise NL contact correlations 
for all 100-kb segments within each chromosome, based on the 
binary NL contact maps of the 1 1 8 single cells. The resulting cor- 
relation matrices, which indicate how pairs of loci may coordi- 
nately attach and detach from the NL in single cells, showed 
remarkable plaid-like patterns on all chromosomes (Figure 5A). 
Along the diagonal, squares of consistently positive correlations 
represent units of coordinated NL contacts, which tend to be 
one to several megabases in size. Most of these units exhibit addi- 
tional off-diagonal correlations with other units, sometimes with 
striking specificity (Figure 5A, examples marked with arrows and 
boxes). Such coordinated units can be tens of megabases apart, 
although thefrequency decays gradually with distance (Figure 5B). 
Randomized contact maps yield much lower average correlation 
coefficients over all distances, demonstrating that the prevalence 
of positive correlations is not due to random chance (Figure 5B). 
Intra-chromosomal coordination is on average higher than inter- 
chromosomal coordination, and the latter is not higher than may 
be expected by random chance (Figure S4A). These results point 
to extensive coordination of NL contacts within chromosomes, 
often over long genomic distances. 

Coordination of NL Contacts Is Partially Linked to 
Physical Proximity as Detected by Hi-C 

The plaid-like correlation patterns reminded us of patterns 
commonly observed with Hi-C (Lieberman-Aiden et al., 2009; 

Dixon et al., 201 2), a technology that maps proximity of genomic 
sequences in nuclear space (Dekker et al., 2013). An intriguing 
possibility was therefore that regions with coordinated NL con- 
tacts could be in spatial proximity to one another. To investigate 
this, we generated Hi-C maps from KBM7 clone #14 cells. The 
resulting Hi-C interaction matrices appeared partially similar to 
the NL contact correlation matrices (Figure 5C). 

To quantify the similarity, we calculated the correlation be- 
tween the degree of NL contact coordination and the Hi-C inter- 
action frequency as a function of genomic distance (Figure 5D). 
The results show that Hi-C interactions correlate moderately but 
significantly with the degree of NL contact coordination. This 
correlation is most prominent at ~1-2 Mb distance and declines 
gradually over longer distances, but is still statistically significant 
(p < 0.001 ) at ~1 00 Mb. Somewhat surprisingly, this positive cor- 
relation appears absent among directly neighboring 1 00-kb seg- 
ments. One possibility is that pairs of adjacent 1 00-kb segments 
have very high Hi-C interaction frequencies due to their physical 
linkage, regardless of any coordination of NL contacts; indeed, 
our analyses show that finer-scale features of Hi-C do not always 
correspond to coordination of NL contacts (see below). 
Together, these data indicate that over a broad range of linear 
distances, coordinated NL contacts have a tendency to be linked 
to close proximity in nuclear space. 

Previous Hi-C studies have revealed multi-scale compartmen- 
talization of chromatin. At the highest level, megabase-sized do- 



mains are segregated into two main compartments that can be 
identified by eigenvector decomposition of the Hi-C matrix (Lie- 
berman-Aiden et al., 2009). The CF pattern shows a remarkably 
tight correlation with this compartment score (Figure S4B; 
genome-wide Spearman’s p = -0.88, weighted average of by- 
chromosome correlations), indicating that the two main Hi-C 
compartments largely correspond to NL-interacting and inter- 
nally located chromatin. 

At a finer scale, the genome is partitioned into topologically 
associated domains (TADs), which are discrete domains that 
have many intra-domain Hi-C interactions but relatively few in- 
teractions with neighboring TADs (Dixon et al., 2012; Nora 
et al., 2012). We computed TAD boundary positions and 
compared them to the NL contact coordination patterns. We 
found examples in which TAD boundaries coincide with borders 
of units of coordinated NL contacts, but also cases in which they 
do not coincide (Figure S4B). We discovered that only TAD 
boundaries that coincide with strong transitions in CF values 
also mark the edges of units of NL contact coordination, while 
TAD boundaries located in regions with relatively uniform CF 
values typically do not (Figure S4C). This is not due to a general 
difference in TAD boundary strength, because the average TAD 
boundary scores were highly similar (Figure S4C, bottom). We 
conclude that TADs overlap only partially with units of coordi- 
nated NL contacts. 

Chromosomal CF Patterns Suggest Long-Range 
Cooperativity 

The prevalence of intra-chromosomally coordinated NL contacts 
raised the possibility that multiple LADs across a chromosome 
associate with the NL in a cooperative manner. In support of 
this, we noticed a strong correlation between the average CF 
of all LADs along each chromosome and the overall density 
of LADs on the same chromosome (Figure E; Spearman’s 
rho = 0.85, p = 9 x 10 -7 ). This is, for example, illustrated by 
the stark contrast between chr18 and chrl 9: the former has a 
very high density of LADs, of which many have consistent NL 
contacts, while the latter has only a few LADs that infrequently 
contact the NL (Figure 3A). The tight correlation between chro- 
mosomal density of LADs and their average CF is suggestive 
of chromosome-wide cooperative LAD-NL interactions. 

NL Contacts Often Involve Embedding of DNA in the NL 

To gain more insight into the nature of NL contacts at the scale of 
single LADs, we visualized these contacts by ground state deple- 
tion (GSD) super-resolution fluorescence microscopy (Foiling 
et al., 2008) in combination with the m6 A-Tracer method (Kind 
et al., 2013). We used a previously established clonal cell line 
(derived from HT 1 080 fibrosarcoma cells) that expresses inducible 
Dam-LmnBI, together with a GFP-tagged m6 A-binding protein; 
this system allows for direct visualization of DNA that is, or has 
been, in contact with the NL (Kind et al., 2013). We labeled the 
NL in the same cells with an antibody against LmnBI and used 
GSD microscopy to obtain super-resolution two-color images. 

20 hr after Dam-LmnBI induction, the m6 A-Tracer signal ex- 
hibits a striking speckled pattern that is mostly confined to a 
zone of ~1 |im underneath the NL (Figures 6A and 6B). The signal 
consists of clusters with diameters in the range of approximately 
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Figure 5. Intra-Chromosomal Coordination of NL Contacts over a Wide Range of Distance 

(A) NL contact correlation matrix for part of chrl 3 in clone #1 4, showing the Pearson correlation of binary NL contacts across the 118 cells for all possible pairs of 
genomic segments. The color key is shown in panel (C). White arrows mark two example regions located ^26 Mb apart that exhibit coordinated contacts (white 
box); the open arrow indicates a region that is not positively correlated with these regions (gray boxes). 

(B) Genome-wide coordinated NL contacts as a function of linear distance. Median, 90 th percentile, and 99 th percentile Pearson’s r values are shown for real 
(cyan) and randomized (gray) binary NL contact data from 118 single cells. 

(C) Comparison of NL contact coordination matrix to Hi-C interaction matrix. The matrix in (A) was turned clockwise by 45° and only a section below the diagonal is 
shown; the corresponding part of the Hi-C interaction matrix (white-red color scale) is juxtaposed to facilitate comparison. Gray lines mark a repetitive region. 

(D) Genome-wide correlation between NL contact coordination and Hi-C interactions, plotted as function of linear distance along chromosomes. Dots mark 
distances at which the correlation is significantly different from zero (p < 0.001). 

(E) Positive correlation (Spearman’s p = 0.85, p = 8.96 x 1 0 7 ) between the fraction of each chromosome covered by LADs and the average CF in these LADs. 
See also Figure S4. 



50-300 nm. A similar pattern is seen 10 hr after Dam-LmnBI in- 
duction (Figure S5A), although the signals are sparser than after 
20 hr. Many of the m6 A-Tracer signals do not touch the NL 
directly, indicating that they represent loci that contacted the 



NL in the recent past and subsequently moved over a short dis- 
tance toward the nuclear interior, as noted previously (Kind et al., 
2013). Nevertheless, close association of m6 A-Tracer signals 
with LmnBI staining is frequently observed (Figure 6A). 
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Figure 6. LAD-NL Interactions Involve Partial Embedding of Chromatin in the NL 

(A) GSD microscopy image sections perpendicular (top) and oblique (bottom) to the NL. Red: LmnBI; green: m6 A-Tracer signal 20 hr after induction of Dam- 
LmnBI . The large red structure in the center of the top panel may be an invagination of the NL. Scale bars represent 500 nm. 

(B) Average pixel intensity of m6 A-Tracer and LmnBI signals as a function of the distance to the center of the NL. Curves show the average of four images. 

(C) Quantification of the overlap of m6 A-Tracer and LaminBI signals within the confines of the NL, compared to random overlap. Data are the average of four 
nuclei. Error bars indicate SD. 

See also Figure S5. 



It is noteworthy that the edge of the LmnBI signal at the NL 
is less sharply defined at the nucleoplasmic side than at the 
inner nuclear membrane side (Figures 6A and 6B). This 
could indicate that lamin filaments extend to varying degrees 



into the nuclear interior, creating a somewhat fuzzy surface. 
Interestingly, the m6 A-Tracer signals that abut the NL often 
appear partially embedded in this LmnBI meshwork (Figure 6A, 
top). Oblique sections show that m6 A-Tracer signals tend to 
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Figure 7. Links of CF to Gene Expression and Chromatin Composition in Single Cells 

(A) Gene expression in pools of KBM7 cells (GEO: GSE56465; n = 7 independent clones) as a function of CF. Dots represent genes; purple bars show median 
values. 

(B) Gene expression level in single clone #14 cells (mean of 96 single cells) as a function of CF. Dots represent genes; two genes with expression levels >100 are 
not shown. Purple bars show median values. Only genes with detectable expression in at least one cell are included. 

(C) Fraction of genes with detectable expression in at least 5 out of 96 cells, as function of CF. Genes not detected in any of the 96 cells are not counted. 

(D) Links of CF to epigenome mapping data from K562 cells (Consortium, 2012). 

See also Figure S6. 



occupy small pockets in the Lamin B1 signals (Figure 6A, bot- 
tom). Indeed, quantitative analysis of oblique sections showed 
that, within the confines of the NL, the overlap between m6 A- 
Tracer clusters and LmnBI is less than expected by chance 
(Figure 6C). This embedding is not caused by thickening of 
the NL due to expression of Dam-LmnBI, because the thick- 
ness of the NL is similar to that of cells in which Dam- 
LmnBI is not induced (Figure S5B). Together, these results 
show that contact of LADs with the NL often involves embed- 
ding of chromatin in the relatively fuzzy nucleoplasmic surface 
of the NL. 



Links between NL Contacts, Single-Cell Gene 
Expression, and Chromatin State 

Because most NL interactions have been linked to gene repres- 
sion (Finlan et al., 2008; Guelen et al., 2008; Reddy et al., 2008; 
Peric-Hupkes et al., 201 0), we asked how transcriptional activity 
of genes is linked to their CF. Analysis of publicly available gene 
expression profiles from pools of KBM7 cells revealed that with 
increasing CFs the distribution of gene expression levels shifts 
gradually toward lower values (Figure 7A). To investigate this 
further, we employed a modified CEL-seq method (Grun et al., 
2014) to generate genome-wide mRNA expression profiles 
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from 96 single KBM7 cells. These data show that the average 
mRNA expression level, as well as the fraction of genes with 
detectable mRNA, declines with increasing CF values (Figures 
7B and 7C), which is in agreement with the cell pool expression 
analysis. Thus, in general, the more stably a gene is associated 
with the NL, the less active it tends to be. We considered that 
genes with mid-range CFs (i.e., genes that associate with the 
NL in only a subset of cells) might exhibit more cell-to-cell varia- 
tion in gene expression. Analysis of the CEL-seq data did not un- 
cover such a relationship (Figures S6A and S6B), although we 
note that technical noise in CEL-seq data may obscure such bio- 
logical noise, particularly because NL-associated genes are 
generally expressed at very low levels. 

Finally, we investigated whether CFs are linked to the presence 
of specific histone modifications. While no genome-wide maps of 
these modifications are available for KBM7 cells, extensive data- 
sets are available from K562 cells (ENCODE Project Consortium, 
2012), which is another myeloid leukemia cell line. These data 
reveal clear negative correlations between CFs and 10 out of 1 1 
tested histone marks, as well as the histone variant H2A.Z (Fig- 
ure 7D), most of which have previously been linked to active tran- 
scription. Only H3K9me3, which is generally linked to gene repres- 
sion (ENCODE Project Consortium, 201 2), is positively correlated 
with CF values (Spearman’s rho = 0.60, p = 1 .7 x 1 0 -12 ). This is 
consistent with the previously reported role of H3K9 methylation 
in NL tethering (Towbin et al., 2012; Bian et al., 2013). In contrast, 
H3K27me3, which is also linked to gene repression, shows a 
strong negative correlation with CFs, suggesting that it does not 
play a role in NL tethering in KBM7 cells. 

DISCUSSION 

Single-Cell Protein-DNA Interaction Mapping by DamID 

Here, we demonstrate that a modified DamID protocol can be 
used to map protein-DNA contacts genome-wide in single cells. 
At present, the resolution of the resulting maps is approximately 
1 00 kb, which is suited to study proteins that form large domains 
along the genome (Bickmore and van Steensel, 2013). We 
expect that the resolution may be further improved by optimiza- 
tion of the DamID protocol and by deeper sequencing of samples 
so that more unique reads are recovered. It will be of interest to 
integrate single-cell DamID mapping with the recently reported 
single-cell Hi-C approach (Nagano et al., 2013). 

Most Chromosomes Have a “Backbone” of Consistent 
NL Interactions 

One interesting outcome of this study is that about 15% of the 
genome contacts the NL in most of the cells (CF > 80%). We pro- 
pose that these consistently interacting loci together form a 
“backbone” that may help to shape the overall chromosome 
architecture. Their strong overlap with constitutive LADs sug- 
gests a common backbone function in many cell types. The 
extremely low gene density within these loci suggests that they 
may have evolved to play a structural role. Not all cLADs have 
high CFs, but because virtually all chromosomes have multiple 
cLADs, the backbone may be robust due to redundancy. The 
precise distribution of loci with stable NL contacts may be crit- 
ical; this is suggested by the previously reported strong evolu- 



tionary conservation of the boundaries of cLADs (Meuleman 
et al., 201 3). It will be of interest to investigate the consequences 
of deleting such stable contact sites from chromosomes. 

Intra-chromosomal Coordination of Variable NL 
Contacts and Spatial Proximity 

Scattered between these stable contact sites are many loci that 
associate with the NL in only a subset of cells. Such loci may be 
subject to a balance between mechanisms that tether them to 
the NL and mechanisms that sequester them in the nuclear inte- 
rior. These variable loci, which together cover about one-third of 
the genome, exhibit a complex pattern of coordinated NL con- 
tacts within chromosomes. This domain-like pattern presumably 
reflects the overall chromosomal architecture. Indeed, our Hi-C 
analysis shows that loci with intra-chromosomally coordinated 
NL contacts tend to be in close proximity in nuclear space, 
particularly in the 0.5-5 Mb distance range. Physical interactions 
between these loci may facilitate the coordination of their 
NL contacts. However, it is also possible that loci with coordi- 
nated NL contacts are more often in spatial proximity of one 
another simply because they are located in the same nuclear 
compartment. 

Multivalent Interactions and Embedding of Chromatin in 
the NL 

The non-random long runs of NL contacts observed in our sin- 
gle-cell maps strongly suggest a multivalent mode of interaction. 
Considering that both the NL and chromosomes consist of poly- 
mer structures, such a mechanism is quite plausible. The Mb 
range over which we find this mechanism to be active appears 
at odds with a report claiming that a short 400-bp repetitive 
sequence was sufficient to target a locus to the NL (Zullo et al., 
2012). One possibility is that this sequence was integrated as a 
long tandem repeat, which often happens in stable transfections. 
Another study identified three independent “peripheral targeting 
regions” in a human LAD, one of which could be narrowed down 
to 6.3 kb (Bian et al., 201 3). However, this element was unable to 
target a free plasmid to the NL, while in genomic context its abil- 
ity to interact with the NL was strongly dependent on the integra- 
tion site, indicating that this element requires support from other 
elements. 

H3K9me3 is strongly correlated with CF values and several 
studies have found that di- and tri-methylation of H3K9 can pro- 
mote NL interactions (Towbin et al., 201 2; Bian et al., 201 3; Kind 
et al., 2013). These histone modifications tend to cover large 
genomic regions (Bickmore and van Steensel, 2013) and are 
thus probable candidates to be involved in multivalent chro- 
matin-NL contacts. 

The largest m6 A-Tracer clusters (~300 nm in size) that we 
observed by super-resolution microscopy may contain ~100- 
250 kb of DNA, a rough estimate that we infer from previous 
polymer modeling of FISH distance measurements in mamma- 
lian nuclei (Mateos-Langerak et al., 2009; Giorgetti et al., 
2014). However, most m6 A-Tracer clusters are substantially 
smaller. We therefore suggest that a typical long (>1 Mb) 
contact run as observed in the single-cell maps may consist of 
a string of such clusters, each typically shorter than 100 kb, 
that contact the NL in a multivalent manner. Future improvement 
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of the resolution of single-cell DamlD mapping may enable the 
identification of individual m6 A-Tracer clusters at the sequence 
level. 

EXPERIMENTAL PROCEDURES 

Detailed methods are described in the Supplemental Experimental 
Procedures. 

Single-Cell DamlD 

The protocol for the detection of Dam methylation in single-cell DamlD is 
similar to that of conventional DamlD (Vogel et al. , 2007) and uses largely 
the same reagents. Key differences are (1) use of clonal cell lines with 
controlled expression of Dam-LmnBI , (2) use of the Fucci system and flow 
sorting to collect single cells at the G1/S transition, (3) the solution used to 
lyse the cells, (4) performance of all enzymatic steps to detect Dam methyl- 
ation in a single well of a 96-well plate, by sequential addition of reagents 
without any purification of the DNA, (v) 4-6 additional PCR cycles, and 
(5) use of lllumina sequencing instead of microarrays as readout. Multiplexing 
of samples was done with indexing primers as listed in Table S2. A detailed 
description is provided in the Supplemental Experimental Procedures, which 
also document the processing of sequencing reads. 

Hi-C, Gene Expression Analysis, and CEL-Seq 

Hi-C data from clone #14 cells were generated in duplicate and processed as 
described (Belton et al., 2012; Lajoie et al., 2015). Gene expression profiles 
from pools of KBM7 cells were obtained from GEO: GSE56465. CEL-seq 
was done essentially as previously described (Griin et al., 2014). 

GSD Microscopy 

Super-resolution microscopy was performed with a Leica SR GSD microscope 
(Leica Microsystems) with a Sumo Stage (#11888963) for drift-free imaging. 
Images were collected with an EMCCD Andor iXon camera (Andor Technol- 
ogy) and an oil-immersion objective (PL Apo 160x, na 1.46). Lasers used 
included 405 nm/30 mW (back-pumping only), 488 nm/300 mW, and 
647 nm/500 mW. Between 10,000 to 50,000 frames were collected at 
1 00 Hz for each SR image. Data were analyzed with the ImageJ Thunderstorm 
analysis module (Ovesny et al., 2014). 

ACCESSION NUMBERS 

The accession numbers for the single-cell and conventional DamlD, Hi-C, and 
CEL-seq reported in this paper are deposited in GEO: GSE69423, GSE69841 , 
and GSE68596. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
six figures, two tables, and five data files and can be found with this article 
at http://dx.doi.Org/1 0.101 6/j. cell. 201 5.08.040. 
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SUMMARY 

Short- and long-distance circadian communication 
is essential for integration of temporal information. 
However, a major challenge in plant biology is to deci- 
pher how individual clocks are interconnected to sus- 
tain rhythms in the whole plant. Here we show that the 
shoot apex is composed of an ensemble of coupled 
clocks that influence rhythms in roots. Live-imaging 
of single cells, desynchronization of dispersed pro- 
toplasts, and mathematical analysis using barycen- 
tric coordinates for high-dimensional space show a 
gradation in the strength of circadian communication 
in different tissues, with shoot apex clocks displaying 
the highest coupling. The increased synchrony con- 
fers robustness of morning and evening oscillations 
and particular capabilities for phase readjustments. 
Rhythms in roots are altered by shoot apex ablation 
and micrografting, suggesting that signals from the 
shoot apex are able to synchronize distal organs. 
Similarly to the mammalian suprachiasmatic nucleus, 
shoot apexes play a dominant role within the plant hi- 
erarchical circadian structure. 



INTRODUCTION 

The circadian clock is a cellular mechanism able to generate 
rhythms in biological processes. A key function of circadian 
clocks is the synchronization of metabolism, physiology, and 
development in anticipation of the diurnal and seasonal environ- 
mental changes (Young and Kay, 2001). Over the last years, 
biochemical and genetic studies have provided a complex view 
of the circadian organization and function in several clock sys- 
tems, including mammals, insects, plants, fungi, and cyanobacte- 
ria (Wijnen and Young, 2006). Rhythms in most organisms are 
generated by reciprocal regulations among core clock compo- 
nents that produce 24 hr oscillations in gene expression, mRNA 
processing, protein abundance, and activity (Harmer et al., 
2001). Changes in chromatin architecture have also emerged as 
a central mechanism coupled to the rhythmic oscillation of clock 
gene expression (Nakahata et al., 2007; Ripperger and Merrow, 
201 1 ; Stratmann and Mas, 2008). 
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Plants as sessile organisms perceive and adapt to the environ- 
mental changes for optimal growth and survival. Consistently, 
nearly all stages of plant development and many essential as- 
pects of growth and metabolism are regulated by the clock (de 
Montaigu et al., 2010; Yakir et al., 2007). Among others, pro- 
cesses such as photo-protection, responses to biotic attacks, 
or the photoperiodic regulation of flowering are controlled by 
the clock (Kinmonth-Schultz et al., 2013). Mechanistically, a 
number of regulatory transcriptional modules have been defined 
at the basis of the Arabidopsis thaliana circadian oscillator. 
Two single MYB-domain transcription factors expressed early 
in the morning, known as CIRCADIAN CLOCK ASSOCIATED 1 
(CCA1 ) (Wang and Tobin, 1998) and LATE ELONGATED 
HYPOCOTYL (LHY) (Schaffer et al., 1998), negatively regulate 
the expression (Alabadi et al., 2001) of the evening-phased 
PSEUDO-RESPONSE REGULATOR 1 (PRR1) or TIMING OF 
CAB EXPRESSION 1 (TOC1) (Makino et al., 2002; Strayer 
et al., 2000). TOC1 (Gendron et al., 2012; Huang et al., 2012) 
and the other members of the PRR family (PRR5, PRR7, and 
PRR9) (Nakamichi et al., 2010) also bind to the promoters of 
CCA 1 and LHY to repress their expression. Additional compo- 
nents such as EARLY FLOWERING3 (ELF3), ELF4, and LUX 
ARRYTHMO (LUX) interact to form the Evening Complex (EC) 
that represses the expression of the early day-phased clock 
gene PRR9 (Heifer et al., 201 1 ; Nusinow et al., 2011). 

At a cellular level, it has been assumed that virtually every 
plant cell might contain an endogenous clock. However, their 
possible circadian communication or coupling has been a matter 
of debate. Circadian analysis using cell cultures (Kim and Som- 
ers, 2010; Nakamichi et al., 2003), records of different rhythmic 
markers (Sai and Johnson, 1999), studies of clock synchroniza- 
tion (Wenden et al., 2012), and circadian characterization of 
guard cells (Yakir et al., 2011) have suggested that plant cellular 
clocks might be only weakly coupled. However, luminescence 
assays in Arabidopsis and analysis of chlorophyll fluorescence 
in Kalanchoe daigremontiana have shown a certain degree of 
cellular coupling in different parts of leaves (Fukuda et al., 
2007; Rascher et al., 2001). A recent interesting report has also 
described particular properties of clocks in leaf veins that are 
able to communicate with the adjacent leaf mesophyll cells 
(Endo et al., 2014). Intercellular coupling opens the question 
about long-distance signaling and synchronization. Indeed, 
circadian oscillations in roots seem to be entrained by signals 
from shoots (James et al., 2008). This situation resembles that 
of the mammalian circadian system in which a master clock 
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Figure 1. Disparity in the Precision and Robustness of Circadian 
Rhythms in Various Organs Excised from the Plant 

(A) Schematic drawing depicting the dissection of the different parts of the 
plant and the subsequent analysis by luminescence assays. Seedlings were 
dissected to separate shoots, hypocotyls, roots, and leaves. 

(B-l) In vivo circadian analysis of luminescent rhythms under LL from 
CCA1::LUC (B, D, F, and H) and TOC1::LUC (C, E, G, and I) in shoots (B and C), 
hypocotyls (D and E), roots (F and G), and leaves (H and I). Data are the 
means + SEM of the luminescence of 6-12 individual samples. Values of 
luminescence signals from hypocotyls, roots, and leaves are represented on 
the right y axes. See also Figure SI . 

located at the suprachiasmatic nucleus (SCN) synchronizes pe- 
ripheral clocks dispersed throughout the body (Aton and Herzog, 
2005; Welsh et al.,2010). 

The functional structure of a circadian system consists of a 
complex assembly of components and mechanisms that are 
precisely coordinated in cells, tissues, and organs. Intercellular 
coupling of circadian clocks might provide an efficient way for 
local synchronization in a particular tissue while long-distance 
signaling can aid in synchronizing distal parts. In this study, we 
have focused on these two particular aspects of circadian 



communication in Arabidopsis and found that the shoot apex 
might act as a master clock that influences rhythms in roots. 

RESULTS 

Differences in Robustness and Precision of Circadian 
Rhythms in Dissected Organs 

To determine organ-specific circadian function, we analyzed 
rhythms in different organs excised from the plant (Figure 1A 
and Supplemental Experimental Procedures). Promoter activity 
was monitored by in vivo luminescence assays of plants ex- 
pressing the morning- ( CCA1 ) and evening-phased (TOC1) 
gene promoters fused to the LUCIFERASE (LUC). Under con- 
stant light conditions (LL), CCA1::LUC and TOC1::LUC expres- 
sion in excised shoots robustly oscillated without evident damp- 
ening. Circadian waveforms closely matched those of whole 
plants (Figures IB and 1C), suggesting that root excision did 
not manifestly affect oscillations in shoots. Excised hypocotyls 
sustained rhythms albeit with a long circadian period (27.02 ± 
0.64 versus 24.61 ± 0.25 in entire plants) and a progressive 
decrease in amplitude over the days (Figures ID and IE). 
Rhythms in excised roots were only sustained for about 
2 days, dampening low afterward (Figures IF and 1G). The fact 
that oscillations in roots do not persist in the absence of sucrose 
could be due to energy limitation, as excised roots are a sucrose 
sink. Indeed, the use of the same procedure for root excision but 
using medium with sucrose revealed that rhythms were sus- 
tained for more than 4 days (Figure SI) with a significantly longer 
period (26.21 ± 0.33) than in shoots (24.63 ± 0.22). The sustained 
oscillations suggest that the excision per se was not responsible 
for the dampened rhythms observed without sucrose. Adding 
sucrose to non-sucrose grown and arrhythmic excised roots 
did not restore the oscillatory pattern (Figure SI), suggesting 
that sugar cannot compensate for the arrhythmia. When excised 
leaves were analyzed in the absence (Figures 1 H and 1 1) or in the 
presence (Figure SI) of sucrose, we observed an averaged 
advanced phase compared to entire plants or shoots. 

Specific Properties for Synchronization and Phase 
Readjustments of Shoot Apex Clocks 

We next performed similar analysis with excised shoot apexes 
(Figure 2A) and found that the phase, period, and amplitude re- 
mained synchronized (Figures 2B and 2C), with rhythms very 
similar to those of the entire plants (Figure S2) and with highly 
synchronous individual waveforms (Figure 2D). These results 
are in clear contrast with the high degree of variability observed 
in individual leaf waveforms, manifested by a range of phases 
and amplitudes from the very first day under LL (Figure 2E). As 
the size of the tissue might influence the circadian waveforms, 
we analyzed small sections of leaves (with sizes similar to those 
of the shoot apexes). Our results showed a similar variability to 
that displayed by full leaves (Figure S2), which suggests that 
the shoot apex homogeneity in waveforms is not due to the 
reduced sizes of the samples. The circadian phases clustered 
together in shoot apexes and to much less extent in leaves (Fig- 
ures 2F and 2G). Similar conclusions were drawn when the 
average phase and the degree of phase coherence were calcu- 
lated using the synchronization index “R” (see Supplemental 
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Figure 2. High Degree of Synchrony and Responsiveness to Envi- 
ronmental Changes of Shoot Apex Clocks 

(A) Schematic drawing depicting the rhythmic analysis of excised shoot 
apexes. 

(B and C) In vivo circadian analysis of luminescent rhythms under LL from 
CCA1::LUC (B) and TOC1::LUC (C) in shoots apexes. 

(D and E) TOC1::LUC luminescence traces of individual excised shoot apexes 
(D) and excised leaves (E). 

(F and G) Analysis of the phase synchrony among the different samples (blue 
crosses) of individual shoot apexes (F) and leaves (G) examined from 26 hr to 
36 hr under LL. The red crosses indicate the means or circular variance 
(Mormann et al., 2000) at each time point. 

(H and I) Average rhythms of TOC1::LUC luminescence in shoot apexes (H) 
and leaves (I) subjected to a “jet-lag” experiment, with extended 12 hr dark- 
ness (extended night) at dawn. 

Data are the means + SEM of the luminescence of 6-1 2 samples. White boxes: 
light; shaded boxes: dark. See also Figure S2. 



Experimental Procedures). The analysis showed high R values, 
close to 1 , for the shoot apexes and lower values for leaves at 
all time points (Figure S2). Consistent with previous studies 



(Wenden et al., 2012), the R values in leaves were well above 
0, which suggests a certain degree of coherence. Rhythms in 
excised organs were highly reproducible in four different biolog- 
ical replicates (each one with 6-12 samples), which reduces the 
possibility that results were due to indirect effects of the excision 
procedure. 

The circadian clock is not only a robust mechanism able to 
sustain rhythms in the absence of environmental transitions 
but also a flexible system that resynchronizes and properly ad- 
justs to changes in the environmental cycle (Harrington, 2010). 
To explore whether the differences between shoot apexes and 
leaves also extend to their capabilities for resynchronization 
and phase adjustment, we performed “jet-lag” experiments. 
In shoot apexes, rhythms showed similar timing for resynchro- 
nization to that of entire plants (Figure 2H), although the shoot 
apex waveforms displayed very rapid declining at night for 
TOC1::LUC and an increased acute induction at dawn for 
CCA1::LUC (Figure S2). In leaves, rhythms showed a double 
peak for the first 2 days, reaching a stable phase at the third 
day after the extended night switch (Figure 21). These results 
reveal different synchronizing behavior in leaves and shoot 
apexes. The specific waveforms in shoot apexes compared 
to the entire plant might also indicate a particular sensibility 
of shoot apexes to dawn and dusk resetting signals. 

Conserved Molecular Architecture of the Circadian 
Network at the Shoot Apex Clocks 

To determine organ-specific differences in the clock molecular 
composition, we examined whether different clock outputs 
and mutations in core clock genes were distinctively regulated 
in shoot apexes and leaves. Analysis of WT plants expressing 
the morning-phased clock output CAB2 (CHLOROPHYLL A/ 
B-BINDING PROTEIN 2) (Millar et al., 1 995) showed that in shoot 
apexes the phase was comparable to that in the entire plant, 
whereas increased heterogeneity and an average advanced 
phase were prevalent in leaves (Figure 3A). Similar to entire 
plants, the shoot apexes and leaves of ccal-11 mutants dis- 
played persistent rhythms with shorter periods than WT shoot 
apexes and WT leaves, respectively (Figures 3B-3D). Similarly, 
the short period of the evening-expressed clock output CCR2 
(COLD, CIRCADIAN RHYTHM, AND RNA BINDING 2) (Strayer 
et al., 2000) in TOC1 RNAi plants (Huang et al., 2012) was also 
observed in shoot apexes and leaves (Figures E and 3F). 
Therefore, circadian gene expression in shoot apexes and 
leaves with various reporter lines and clock mutant back- 
grounds did not render major differences between the two 
organs. 

To profile the circadian transcriptional landscape at the shoot 
apex, we performed RNA sequencing (RNA-seq) analysis and 
used the JTK_CYCLE algorithm for precise definition of circa- 
dian expression (Hughes et al., 2010). After filtering out tran- 
scripts whose median expression across every sample was 
lower than 0.69 RPKM and those not differentially expressed, 
we identified over 1,400 genes with significant circadian fluctu- 
ations in mRNA abundance. Visual inspection of the data sug- 
gested that this may be a conservative estimation. However, 
the stringent analysis ensured the selection of the highest-con- 
fidence circadian hits. Rhythmic genes included all the 
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Figure 3. Phenotypes of Core Clock Muta- 
tions in Shoot Apexes and Leaves 

Average rhythms of CAB2::LUC (A-D) and CCR2:: 
LUC (E and F) luminescence under LL in entire 
plants, shoot apexes, and leaves of WT, ccal-11 
mutants (A-D) and TOC1 RNAi (E and F). Plants 
were entrained under LD cycles and processed 
as detailed in the Supplemental Experimental 
Procedures. Data are the means + SEM of the 
luminescence of 6-12 samples. Values of lumi- 
nescence signals from ccal-11 mutant and TOC1 
RNAi are represented on the right y axes. 





previously described core clock components, genes involved 
in light signaling, and those involved in circadian outputs 
such as photosynthesis, photoperiodic flowering, and hormone 
signaling, among others (Figures 4A-4D and S3). The wave- 
forms oscillated with similar phases and amplitudes to those 
previously reported in entire plants (Figures 4E-4G), which sug- 
gests no fundamental differences in the global transcriptional 
circadian networks in the shoot apex and entire plants. It is 
noteworthy that shoot apexes display such strong and robust 
rhythms (both morning- and evening-expressed genes) as 
opposed to the uncoupled rhythms in roots (only morning) 
(James et al., 2008) and in veins (mainly evening) (Endo et al., 
2014). Functional categorization of the rhythmic genes showed 
a wide range of biological functions, highlighting as most sig- 
nificantly enriched those genes involved in circadian rhythms 
and responses to environmental conditions, including different 
qualities of light, temperature, and radiation (Figure S3). This 
enrichment might explain the specific readjustment of shoot 
apexes to environmental changes observed in our jet-lag 
experiments. 

Differences in Synchrony of Clock Cells in Various 
Organs and Tissues 

To understand the cellular basis of the circadian rhythmicity at 
the shoot apex, we examined rhythms from individual cells of 
plants expressing CCA1 -HA-EYFP under its own promoter (Yakir 
et al., 2009). We performed in vivo time-course analysis by 



confocal imaging of excised shoot 
apexes embedded in agarose (Mas and 
Beachy, 1998). Fluorescent signals from 
individual nuclei of shoot apex cells sus- 
tained rhythmic oscillations. The circa- 
dian waveforms maintained good syn- 
chrony, manifested by similar timing in 
their rising and declining phases even af- 
ter 3 days under LL (Figure 5A, left panel 
and Figure 5B). The results were also 
evident when the confocal imaging 
started at different time points (Figure S4). 
A similar pattern of highly synchronous 
waveforms was observed with single cells 
from shoot apexes of FLAG-PRR7-EGFP- 
expressing plants (Nakamichi et al., 201 0) 
(Figure S4). In contrast, and consistent 
with previous data (Yakir et al., 2011), the variation in the rhyth- 
mic accumulation of CCA1 -HA-EYFP in individual leaf cells 
significantly increased after 2 days under LL (Figure 5A, right 
panel and Figure 5C). Differences in phase and amplitude were 
also clearly observed when fluorescent signals were not relativ- 
ized to the maximum (Figure S4). We also measured fluores- 
cence from the leaf vasculature, as previous studies have shown 
that these cells are coupled (Endo et al., 201 4). We observed two 
distinguishable populations with slightly different phases (Fig- 
ure S4). Individual cell-to-cell comparisons showed that both 
populations maintain a certain degree of synchrony (Figures 
5D and 5E). Synchrony appeared to be higher than that observed 
in leaf mesophyll cells but lower than in cells at the shoot apex. 
Quantitative analysis of the waveform correlation among individ- 
ual cells confirmed that the correlation coefficient in shoot apex 
cells was higher than the one for vascular cells with the advanced 
(A) or delayed (D) phase (Figures 5F and 5G). The group of cells 
with a delayed phase appeared to be more synchronous than the 
group with an advanced phase. The waveforms in leaf mesophyll 
cells displayed lower correlation values and increased heteroge- 
neity. A higher synchrony in shoot apexes compared to vascular 
cells or the mesophyll cells adjacent to the leaf veins (Figure S4) 
was also observed when an evening-expressed gene was exam- 
ined (ELF3-EYFP) (Dixon et al., 2011). In this case, the separation 
of cells with advanced and delayed phases was not so evident in 
veins (Figure S4). Together, the results confirmed at the level of 
single cells and with three different reporters our conclusions 
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Figure 4. Transcriptional Profiling of the 
Circadian Program at the Shoot Apex 

(A) Heatmap showing median-normalized gene 
expression at different circadian times (CT, vertical 
axis) for transcripts (horizontal axis) with a peak 
phase of expression at mid-late subjective night. 
Yellow indicates high expression and blue low 
expression. 

(B-D) Gene-expression analysis of CCA1, LHY (B), 
PRR3, PRR7, TOC1 (C), and Gl, FKF1, CDF2 (D) in 
shoot apexes of WT plants grown under LD cycles 
followed by 2 days under LL. 

(E and F) Phase distribution of rhythmic genes 
in shoot apexes and entire plants. Phase enrich- 
ment was calculated using the web-based tool 
“Phaser.” The phase estimates were represented 
relative to their maximum (E) and in pie charts (F) 
displaying the contribution of each phase to the 
total. Left chart: shoot apex; right chart: entire 
plants. 

(G) Distribution of amplitudes of cycling transcripts 
in shoot apexes calculated by using the algorithm 
JTK_Cycle. 

See also Figure S3. 



ward (Figure 5J). As individual cells at 
the shoot apex are able to maintain rhyth- 
mic oscillations (Figure 5B), one plausible 
explanation to our results is that 
dispersed cells do not sustain rhythms 
due to reduced intercellular communica- 
tion and subsequent desynchronization 
over time. 

In the mammalian circadian system, the 
clock components PERI and CRY1 are 
required for sustained rhythms in periph- 
eral tissues and in neurons dissociated 
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on the distinct degrees of synchrony in shoot apexes, leaf meso- 
phyll cells, and veins. 

Intercellular Circadian Coupling among Clock Cells of 
the Shoot Apex 

If coupling of shoot apex clocks is responsible for the waveform 
synchrony, then rhythms should be affected when the intercel- 
lular communication is disrupted. To explore this idea, we com- 
pared shoot apexes from intact tissues and from dissociated and 
diluted protoplasts. Rhythms in excised shoot apexes main- 
tained good synchrony and were sustained for several days. 
However, in diluted shoot apex protoplasts, the oscillations per- 
sisted only for 2-3 days, increasing their heterogeneity over time 
(Figure 5H). Further dilution of protoplasts increasingly advanced 
the timing of rhythmic dampening (Figures 51 and S4). Analysis of 
the R values in shoot apexes and in diluted protoplasts quantita- 
tively confirmed that the phase coherence in protoplasts was 
only sustained for less than 2 days, reaching asynchrony after- 



from the SCN (Welsh et al., 201 0). Howev- 
er, cellular interactions at the SCN can 
compensate for Perl or Cryl deficiency 
(Evans et al., 2012; Liu et al., 2007). We 

found a similar scenario at the shoot apex of lux mutants. In 
contrast to the reported arrhythmia of lux-2 plants, the lux-2 
shoot apexes were able to sustain rhythms to a certain degree. 
Although the rhythms were clearly compromised, rhythmicity at 
the lux-2 shoot apex was better than in leaves (Figures 5K, 5L, 
and S4). Thus, the absolute requirement of LUX function in 
leaves is not so apparent in shoot apexes. The differences are 
not due to changes in the circadian expression of LUX or the 
other components of the EC, ELF3, and ELF4, as verified 
by our RNA-seq analysis and by qRT-PCR (Figure S4). If in anal- 
ogy to the mammalian system, effective intercellular coupling 
among the shoot apex cells is responsible for the distinctive 
phenotype, then disruption of the cellular communication should 
affect the rhythms. Indeed, shoot apex protoplasts from lux-2 
mutants were arrhythmic throughout the time-course analysis 
(Figure 5M). We proposed that the arrhythmic phenotype in pro- 
toplasts is the result from the rapid desynchronization of the 
dispersed cells, each containing a semi-functional oscillator. 
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Figure 5. Circadian Coupling Defines the High Synchrony of Shoot Apex Clock Cells 

(A) Representative fluorescent signals from CCA1-HA-EYFP accumulation in nuclei of shoot apex cells (left panel) and leaf cells (right panel). Panels show 
representative cells from a larger picture containing other cells out of the shown field (scale bar, 20 i^m). 

(B-E) In vivo time-course imaging of CCA1 -HA-EYFP fluorescent signals quantified in individual nuclei from shoot apex (B), leaf mesophyll (C), and leaf vascular 
cells with advanced (D) and delayed (E) phases. Data are represented relative to the maximum value. 

(F and G) Correlation coefficients among the waveforms of individual nuclei in shoot apex, leaf mesophyll cells, and leaf vascular cells with advanced (F) and 
delayed (G) phases. 

(H and I) Luminescence analysis of CCA1::LUC activity in diluted (FI) and further diluted series of protoplasts (I) from shoot apexes. Protoplasts were synchronized 
for an additional day under LD before transferring to LL. Data are the means + SEM of the luminescence of 6-12 samples. 

(J) Quantification of the phase coherence in intact shoot apexes and in shoot apex protoplasts by calculating the synchronization index “R.” 

(legend continued on next page) 
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Our results indicate that intercellular communication might be 
important for rhythms at the shoot apex. To mathematically 
explore the degree of intercellular coupling, we developed a pre- 
dictive model by using barycentric coordinates for high-dimen- 
sional space (Hirata et al., 2015). The model involves the use of 
linear programming that assigns different weights to neighboring 
cells and identifies the strength of coupling based on the accu- 
racy of the predictions given the weights. We first tested the 
performance of the proposed methods using the Kuramoto 
(Kuramoto, 1975) and the coupled Rossler (Rossler, 1976) toy 
models. The examples showed that the weights of neighboring 
oscillators are higher when the coupling is stronger (Figure 5N). 
When the model was used with the single-cell confocal data, 
we found that shoot apex clocks were highly coupled and had 
greater coupling strength than leaf vasculature or leaf mesophyll 
cells (Figure 5N). Together, the results confirmed a gradation or 
hierarchy in the strength of the circadian communication in 
different parts of the plant. 

Relevance of the Shoot Apex Clocks in the Modulation of 
Circadian Oscillations in Roots 

We next addressed the possible role of the shoot apex control- 
ling the circadian function in roots. We adapted the lumines- 
cence assay protocol to examine rhythms in both shoots and 
roots of intact plants (Figure 6A). We also used laser microdis- 
section (LMD) to excise shoot apexes and examine rhythms in 
Ashoot apex plants (Figure 6B). Previous studies have reported 
that rhythms dampened low and waveforms broadened in entire 
plants after several days under free-running conditions (Yakir 
et al., 201 1). We found that rhythms at the shoot apex were sus- 
tained for more than 7 days under LL (Figure 6C), which suggests 
that intercellular coupling at the shoot apex might contribute to 
the rhythmic robustness after extended periods under LL. 
When we examined rhythms in Ashoot apex plants, we observed 
an advanced average phase and increasing waveform variability, 
in a similar fashion to that of excised leaves (Figures 6D and S5). 
Application of auxin did not noticeably affect the rhythms in 
shoots of entire plants or Ashoot apex plants (Figure S5), which 
suggests that the Ashoot apex phenotypes are not due to 
changes in auxin flux. It is noteworthy that rhythms in plants 
that only lack the shoot apex are similar to the rhythms in leaves, 
whereas the small shoot apex is able to more precisely sustain 
rhythms. Unexpectedly, we also found that rhythms in plants 
without cotyledons or leaves were almost indistinguishable 
from the ones observed in intact plants (Figures 6E and 6F). 

Photosynthetic sucrose has been shown to modulate clock 
function (Haydon et al., 201 3; James et al., 2008). Our studies re- 
vealed an initial phase delay and period lengthening that led to 
dampened rhythms in shoots from intact plants treated with 
the inhibitor of the photosynthetic electron transport [3-(3,4-di- 



chlorophenyl)-1 ,1-dimethylurea, DCMU) (Figure S5). When we 
applied the drug only in shoots and checked the effects on roots, 
we found a phase delay and dampened rhythms (Figure S5). 
These results confirmed that photosynthetic signals from shoots 
are important for the root clock. DCMU treatment in excised 
shoot apexes also led to eventual dampening of rhythms, but 
the early phase delay observed in whole shoots and roots was 
not so evident (Figure S5). These results suggest increased 
robustness against pharmacological perturbation of photosyn- 
thesis at the shoot apex. 

To further explore the importance of circadian communication, 
we used plants with reduced intercellular trafficking by means 
of CALS3 gain-of-function mutations ( cals3-d ) that lead to 
reduced plasmodesmata aperture (Vaten et al., 2011). Our re- 
sults showed that blocked trafficking clearly altered the rhythmic 
expression of core clock genes in roots, with no evident peak 
and trough expression as observed in WT roots (Figures 6G, 
6H, and S5). We also examined rhythms in shoots and roots 
that were rapidly separated following 2 days of luminescence 
analysis of the intact plants (Figure 61). The separation led to 
dampening of rhythms in roots, indicating that rhythms in roots 
are altered very rapidly after separation from shoots. To ascer- 
tain the role of the shoot apex on root synchronization, we then 
examined circadian rhythms in roots from intact plants in which 
the shoot apex was removed (Ashoot apex plants) (Figure 6B). 
Our results showed that rhythms were clearly affected, with an 
initial long-period phenotype that progressively led to arrhythmia 
(Figure 6J). Rhythms in roots from plants in which leaves and cot- 
yledons were removed were not severely affected and showed a 
slightly advanced phase compared with the rhythms in roots 
from intact plants (Figure S5). Noteworthy are also the results 
of jet-jag experiments showing that roots from intact plants 
were able to resynchronize with a pattern that more closely 
resembled the one in shoot apexes than the one in leaves 
(Figure S5). 

A Hierarchical Structure at the Core of the Arabidopsis 
Clock 

Efficient micrografting of Arabidopsis seedlings is a powerful tool 
for studying long-distance signaling (Bainbridge et al., 2014). To 
conclusively determine the possible hierarchical nature of the 
plant circadian system, we performed micrografting with young 
Arabidopsis seedlings using the shoot apex as scion (Figure 7A). 
We reasoned that grafting with different genotypes would pro- 
vide definitive information on the role of shoot apexes on the 
root oscillation. 

Micrografting and luminescence analysis were first tested on 
WT self-grafts (WT Shoot Apex-WT Roots, WT SA-WT Rt). The 
analysis showed that CCA1::LUC and TOC1::LUC rhythms fol- 
lowed a similar trend to that observed in entire non-grafted 



(K) Average luminescence of CAB2::LUC activity in shoot apexes and leaves of lux-2 mutant plants. Data are means + SEM of the luminescence of six samples. 

(L) Period estimates of CAB2::LUC activity from individual traces analyzed as detailed in the Supplemental Experimental Procedure. 

(M) Luminescence analysis of CAB2::LUC activity in protoplasts from shoot apexes of lux-2 mutant plants. Data represent means + SEM of 6-12 samples. 
Protoplasts were synchronized for an additional day under LD before transferring to LL. 

(N) Mathematical analysis of the coupling strength by barycentric coordinates for high-dimensional space using the Kuramoto and coupled Rossler toy models 
and the in vivo CCA1-EYFP imaging data. The line in the middle of the box is plotted at the median. The whiskers represent the minimum and maximum values. 
See also Figure S4. 
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Figure 6. Rhythms at the Shoot Apex Influence the Circadian Activity in Roots 

(A) Schematic drawing depicting the rhythmic analysis of shoots and roots from intact plants. 

(B) LMD was used to obtain Ashoot apex plants. Seedlings were horizontally positioned in serrated 96-well microplates so that rhythms could be examined in 
roots and shoots. 

(C) Average rhythms of TOC1::LUC luminescence in shoot apexes for extended days under LL. 

(D) TOC1::LUC luminescence in plants in which the shoot apexes were removed by LMD. 

(E and F) CCA1::LUC luminescence in plants in which the cotyledons (E) and leaves (F) were removed. 

(G and H) qRT-PCR analysis of TOC1 (G) and CCA1 (H) expression in shoots and roots of WT and cals3 mutant plants. Plants were synchronized under LD, and 
samples were taken after 2 days under LL at CT3 and CT15. 

(I) CCA1::LUC luminescence from roots after rapid dissection from shoots. 

(J) CCA1::LUC luminescence in roots from intact plants and Ashoot apex plants. Luminescence was recorded under LL following synchronization under LD. 
Data are represented as the means + SEM. See also Figure S5. 



plants (Figures 7B and 7C). Rhythms in roots exhibited a longer 
period compared to shoots, which also mirrored the observa- 
tions in organs of non-grafted plants (Figure SI). As these results 
suggested that the grafting procedure did not manifestly alter the 
circadian oscillation, we next grafted the shoot apex of 
arrhythmic plants into a WT rootstock. We reasoned that the 
lack of a functional clock in the shoot apex should alter the 
rhythms in roots. Indeed, grafting the shoot apex of the 
arrhythmic ccal-l/lhy-1 1 plants (Mizoguchi et al., 2002; Portoles 
and Mas, 2010) (Figure S6) disrupted the rhythms of WT roots 
(Figure 7D). A similar alteration of WT root rhythms was observed 
with the shoot apex of elf3-2 mutants (Hicks et al., 1996) (Fig- 
ure 7E). Although slight oscillations could be appreciated, the 
amplitude and robustness of the waveforms were clearly 
affected. These results confirmed that proper clock function in 
the shoot apex is important for the rhythmic activity in roots. 
We then performed the reverse experiment in which WT shoot 
apexes were grafted into arrhythmic rootstocks to test the ability 
of shoot apex signals to reestablish the rhythms in roots. 



Remarkably, the arrhythmia of ccal-l/lhy-1 1 or elf3-2 roots 
could be partially restored by grafting the shoot apex of WT 
plants (Figures 7F and 7G). The oscillations were not very robust, 
but the patterns were not as arrhythmic as the roots of non- 
grafted plants (Figure S6). Although we observed variability in 
the degree of restored rhythms (Figure S6), the recovery was 
quite evident. Altogether, we conclude that signals from the 
shoot apex are important for circadian oscillations in roots. 

DISCUSSION 

A series of different protocols developed in this study has al- 
lowed us to follow the rhythmic expression in excised organs 
of the plant. Under sucrose, rhythms were sustained in all or- 
gans examined and the tissues continued growing normally af- 
ter excision, which suggests that the excision did not manifestly 
affect the rhythms. The different excised organs displayed a 
wide range of circadian properties. Hypocotyls and roots lack 
precision and robustness, with long circadian periods and 
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Figure 7. A Hierarchical Dominance of the 
Shoot Apex Clocks 

(A) Schematic drawing depicting the rhythmic 
analysis of micrografted plants as detailed in 
Experimental Procedures. 

(B and C) Analysis of CCA1::LUC (B) and 70- 
C1::LUC (C) luminescence in shoots and roots of 
WT scion and WT rootstocks. 

(D-G) Luminescence in shoots and roots of ccal / 
Ihy mutant scion and WT rootstocks (D), elf3 
mutant scion and WT rootstocks (E), WT scion and 
ccal /Ihy mutant rootstocks (F), and WT scion and 
elf3 mutant rootstocks (G). Luminescence was re- 
corded under LL following synchronization under 
LD. Values of luminescence signals from roots are 
represented on the right y axes. 

See also Figure S6. 
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arrhythmia, whereas leaves lack synchrony among the different 
samples from plants similarly entrained. As roots are a sucrose 
sink, our results with excised roots (±sucrose) are consistent 
with previous studies (Haydon et al., 2013; James et al., 
2008) and with the dampening of rhythms in roots when shoots 
are treated with DCMU. Analysis of root rhythms in Ashoot 
apex plants rendered similar results to those of excised roots, 
which confirmed the dependency of roots on the circadian 
communication with shoot apexes. The heterogeneity of circa- 
dian waveforms in leaves is also consistent with previous 
studies (Wenden et al., 2012). Phase heterogeneity might be 
due to differences in circadian coupling among various leaf 
cell types. Mesophyll cells in leaves are only weakly coupled, 



— Shoots (WT SA - WT Rt) 

— - Roots (WTSA-WTRt) 

-3000 whereas the leaf vasculature synchro- 
nizes the neighboring mesophyll cells 
(Endo et al., 2014). This local synchroni- 
zation raises the question about possible 
differences in rhythms of mesophyll cells 
close to the vasculature and those 
located far from the veins. Desynchroni- 
zation between leaf stomatal and meso- 
phyll cells (Yakir et al., 2011) could be 
another source of phase heterogeneity 
in leaves. 

Shoot apexes displayed remarkable 
homogeneous rhythmicity with highly 
synchronous waveforms. Among the tis- 
sues examined, different patterns of 
waveform synchrony could be distin- 
guished: the cells from the shoot apex 
with the highest synchrony, the intermedi- 
ate synchrony in the vascular cells, and 
the lowest synchrony observed in leaf 
mesophyll cells. The fact that the syn- 
chrony is lost in dispersed, diluted shoot 
apex protoplasts suggests that the phase 
coherence and synchrony might be due to 
high intercellular coupling among shoot 
apex clocks. The development of a 
tailor-designed mathematical model us- 
ing barycentric coordinates for high-dimensional space 
confirmed this notion. The method has been proven successful 
for a wide range of uses, from weather forecasting to creation 
of musical instruments with natural sounds (Hirata et al., 2015). 
Our studies also revealed that the intercellular coupling or circa- 
dian communication among shoot apex clocks confer robust- 
ness against genetic mutations and pharmacological perturba- 
tions. These properties closely resemble those of the circadian 
system in mammals in which intercellular coupling among neu- 
rons at the SON can compensate for the absence of functional 
key clock components (Evans et al., 2012; Liu et al., 2007). 

The molecular circadian network and phenotypes of core 
clock mutants at the shoot apex appear to be similar to those 
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described in the whole plant. However, prevalence for morning- 
or evening-expressed genes has been shown for the clocks of 
leaf mesophyll cells and leaf veins, respectively (Endo et al., 
2014). Uncoupled morning and evening oscillators have been 
also previously reported for the clock in roots (James et al., 
2008). Our full time-course analysis by RNA-seq showed robust 
rhythms of circadian genes with similar peak phases and relative 
amplitudes to those reported in entire plants. The particular 
properties that we observed at the shoot apex clocks might 
result from their strong intercellular coupling rather than from a 
distinctive molecular network. We also found a clear enrichment 
of genes involved in responses to environmental signals. This 
enrichment might be responsible for the distinctive waveforms 
in jet-lag experiments, as if the shoot apex clocks were highly 
sensible to perceive and respond to the changing environmental 
conditions. The enrichment might be particularly useful for the 
shoot apical cells that are buried and shielded from the environ- 
ment. Intercellular coupling might also be an aid for circadian 
synchronization of cells with reduced light accessibility. The 
fact that genes responsible for perception of synchronizing sig- 
nals such as light and temperature are enriched in our RNA- 
seq data is consistent with a main role of shoot apexes as a 
synchronizing master clock. 

Grafting has been used to study long-distance signaling in 
different processes, for instance shoot branching (Turnbull 
et al., 2002) or stress responses (Holbrook et al., 2002). The 

studies presented here demonstrate the long-distance circadian 
signaling by micrografting approaches. Our results have re- 
vealed the influence of shoot apexes on the rhythmic activity of 
roots. A plausible idea is that changes in auxin flux could be 
responsible for synchronizing rhythms in roots. However, our re- 
sults suggest that auxin signaling has a minor, if any, role in the 
long-distance circadian communication. The partial recovery of 
mutant rootstocks by grafting WT shoot apexes and, conversely, 
the arrhythmia of WT roots grafted with arrhythmic shoot apexes 
reflect the circadian hierarchy of shoot apexes. This situation is 
reminiscent of the circadian system in mammals in which genetic 
defects in peripheral clocks are phenotypically rescued by 
the hierarchical dominance of the SCN (Pando et al., 2002). 
The micrografting results were consistent with the shoot apex 
role influencing rhythms in roots, which was observed by other 
approaches used in this study (rapid dissection of shoots and 
roots, delta shoot apex plants, pharmacological treatments, 
and genetic analysis). The similar phenotypes reinforce the 
validity of the different procedures and the consistency of our 
conclusions. 

Based on the recently discovered role of the plant vasculature 
(Endo et al., 2014), a very interesting possibility is that veins are 
used as the circadian traveling “highway” in which the synchro- 
nizing signals circulate from shoot apexes to roots. In analogy 
with the mammalian circadian system, the shoot apex clock cells 
might function as the SCN neurons, whereas the plant vascula- 
ture could be comparable to blood veins and arteries. Further 
studies of topographically defined areas of circadian coupling 
and elucidation of the signals and mechanisms contributing to 
the circadian communication will be central to fully define the 
spatio-temporal networks orchestrating plant physiology and 
development on each organ, tissue, and cell. 



EXPERIMENTAL PROCEDURES 

Organ Dissection and Micrografting Experiments 

Organ dissection was performed as detailed in the Supplemental Experimental 
Procedures. For micrografting experiments, Arabidopsis seedlings were 
grown vertically on half-strength Murashige and Skoog (MS) agar medium 
with 0.5% sucrose for 3-7 days. Seedlings were placed on wet filters under 
the dissecting microscope in a laminar flow cabinet as described (http:// 
www.bio-protocol.org/e1164). Cotyledons were removed, and both scion 
and rootstock seedlings were horizontally cut with a razor blade just below 
the shoot apex. With forceps, and very gently, the scion and rootstock cut 
stumps were joined together, paying attention to match up the two phloem 
strands. When grafting was completed, plates were sealed with two layers 
of micropore tape and returned to the growth chamber for at least 4-6 more 
days. If present, adventitious roots on the scions were removed before lumi- 
nescence analysis. The unsuccessful grafted seedlings were identified as 
the grafts failed to properly join together. In cases when the successful grafting 
was not clear, the resulting plants were discarded. A total of 120 grafting 
events were assayed for WT SA-ccal/lhy Rt plants. The percentage of suc- 
cessfully micrografted plants was about 50% (possibly higher but only fault- 
lessly grafted plants were examined). From the 59 successfully grafted WT 
SA - ccal/lhy Rt plants, 50 (i.e., around 85%) showed different degrees of 
restored rhythms (p value = 3.77 x 10“ 12 by Fisher’s exact test, considering 
that none of the 20 ccal/lhy SA - ccal/lhy Rt plants displayed rhythms in roots). 
For the control WT SA-WT Rt grafting, 22 out of 24 successfully grafted plants 
showed very robust rhythms. 

RNA Extraction and RNA-Seq Analysis 

RNA extraction and RNA-seq analysis were performed as detailed in the Sup- 
plemental Experimental Procedures. 

Single-Cell Confocal Microscopy Imaging 

For in vivo confocal imaging at a single-cell resolution, excised shoot apexes 
or leaves were embedded just after dissection in low-melting-point agarose 
dissolved in MS medium as previously described (Mas and Beachy, 1998). 
Further details are described in the Supplemental Experimental Procedures. 

Protoplast Preparation and Gene-Expression Analysis 

Protoplast preparation (Yoo et al., 2007) and gene-expression analysis (Mala- 
peira et al., 2012) were performed as described. Details are described in the 
Supplemental Experimental Procedures. 

Mathematical Analysis 

Mathematical analysis was performed as described in Hirata et al. (2015). See 
further details in the Supplemental Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures 
and six figures and can be found with this article online at http://dx.doi.org/ 
10.1016/j.cell.2015.08.062. 
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SUMMARY 

Focal adhesion kinase (FAK) promotes anti-tumor 
immune evasion. Specifically, the kinase activity of 
nuclear-targeted FAK in squamous cell carcinoma 
(SCC) cells drives exhaustion of CD8 + T cells and 
recruitment of regulatory T cells (Tregs) in the tumor 
microenvironment by regulating chemokine/cytokine 
and ligand-receptor networks, including via tran- 
scription of Ccl5, which is crucial. These changes 
inhibit antigen-primed cytotoxic CD8 + T cell activity, 
permitting growth of FAK-expressing tumors. Mech- 
anistically, nuclear FAK is associated with chromatin 
and exists in complex with transcription factors and 
their upstream regulators that control Ccl5 expres- 
sion. Furthermore, FAK’s immuno-modulatory nu- 
clear activities may be specific to cancerous squa- 
mous epithelial cells, as normal keratinocytes do 
not have nuclear FAK. Finally, we show that a 
small-molecule FAK kinase inhibitor, VS-4718, which 
is currently in clinical development, also drives 
depletion of Tregs and promotes a CD8 + T cell-medi- 
ated anti-tumor response. Therefore, FAK inhibitors 
may trigger immune-mediated tumor regression, 
providing previously unrecognized therapeutic 
opportunities. 

INTRODUCTION 

First described more than a decade ago (Onizuka et al., 1999; 
Shimizu et al., 1999), regulatory T cells (Tregs) have become 
recognized as a core component of the immuno-suppressive ar- 
mory utilized by many tumors to keep the anti-tumor activity of 
antigen-primed CD8 + T cells at bay. Increased Treg numbers 
has been associated with poorer survival in ovarian (Curiel 
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et al., 2004), gastrointestinal (Sasada et al., 2003), and esopha- 
geal (Kono et al., 2006) cancer. Indeed, the ratio of CD8 + 
T cells/Tregs correlates with poor prognosis, shifting the balance 
from anti-tumor immunity toward tumor tolerance (Quezada 
et al., 2006; Sato et al., 2005; Shah et al., 2011). Through 
secreting a range of chemokines and cytokines, cancer cells 
can promote the recruitment of Tregs into tumors and can also 
facilitate their peripheral expansion and retention (Darrasse- 
Jeze and Podsypanina, 201 3; Ondondo et al., 201 3). Thus, Tregs 
can act as a barrier to effective immune-based therapy aimed at 
activation of a CD8 + T cell anti-tumor immune response. How- 
ever, the specific signals within tumor cells that stimulate 
elevated intra-tumoral Tregs, giving rise to tumor tolerance, 
remain elusive. 

FAK is a tyrosine kinase that regulates diverse cellular func- 
tions, including adhesion, migration, invasion, polarity, prolifera- 
tion, and survival (Frame et al., 2010). Using targeted gene dele- 
tion in mouse skin, we have previously shown a requirement for 
fak in tumor initiation and progression to malignant disease 
(McLean et al., 2004). FAK is also required for mammary tumor 
progression, intestinal tumorigenesis, and the androgen-inde- 
pendent formation of neuroendocrine carcinoma in a mouse 
model of prostate cancer (Ashton et al., 2010; Lahlou et al., 
2007; Luo et al., 2009a; Provenzano et al., 2008; Pylayeva 
et al., 2009; Slack-Davis et al., 2009). Expression of FAK is 
elevated in a number of tumor types (reviewed in McLean 
et al., 2005), and FAK inhibitors are being developed as potential 
cancer therapeutics (Roberts et al., 2008; Shapiro et al., 2014). 
Many of FAK’s functions in cancer are via its role in signaling 
downstream of integrins and growth factor receptors at the 
plasma membrane. FAK also contains putative nuclear localiza- 
tion sequences (NLS) within the F2 lobe of its FERM domain and 
can localize to the nucleus upon receipt of cellular stress, where 
it binds to p53 (Lim et al., 2008). However, the extent of FAK’s nu- 
clear functions remains largely unknown. Here, we report a func- 
tion for nuclear FAK in regulating transcription of inflammatory 
cytokines and chemokines, in turn promoting an immuno-sup- 
pressive, pro-tumorigenic microenvironment. This is mediated 

CrossMark 





Cell 



A 



CDI-Nude (Immune-deficient) 



B 



FVB (Immune-competent) 





C D 





Figure 1 . Loss of FAK or FAK Kinase Activ- 
ity Results in CD8 + T Cell-Dependent SCC 
Tumor Clearance 

(A and B) SCC FAK-WT and SCC FAK~'~ subcu- 
taneous tumor growth in immune-deficient CD-I 
nude mice (A) and immune-competent FVB 
mice (B). 

(C and D) SCC FAK~'~ (C) and SCC FAK-WT (D) 
tumor growth in FVB mice treated with T-cell- 
depleting antibodies. 

(E) Secondary tumor re-challenge with SCC FAK~'~ 
(top) and SCC FAK-WT (middle) cells following a 
pre-challenge with SCC FAK~'~ cells and a 7-day 
tumor-free period. Subcutaneous growth of SCC 
FAK-WT and SCC FAK 7 tumors injected at day 
28 without pre-challenge (bottom). 

(F) Tumor growth in FVB mice following subcu- 
taneous injection of SCC FAK-WT, SCC FAK^'~, 
and SCC FAK-KD cells. 

*p < 0.05, **p or ++ p < 0.01 , ****p < 0.0001 ; Sidak- 
corrected two-way ANOVA (A and B) or Tukey- 
corrected two-way ANOVA (C, versus SCC FAK~'~\ 
D, versus SCC FAK-WT; F, *, versus SCC FAK~'~ 
and + , versus SCC FAK-KD). Data are represented 
as mean ± SEM; n = 5-6 tumors. 
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by recruitment and expansion of Tregs via FAK-regulated che- 
mokine/cytokine networks, and we have found an important 
role for Ccl5 and TGF(32. Therefore, FAK controls the tumor envi- 
ronment, and suppressing FAK activity, including via a clinically 
relevant FAK inhibitor, may be therapeutically beneficial by trig- 
gering immune-mediated tumor regression. 

RESULTS 

FAK-Deficient SCC Tumors Undergo Regression in 
an Immune-Competent Host 

We used a syngeneic model of SCC in which the fak gene had 
been deleted by Cre-lox recombination (McLean et al., 2004; 
Serrels et al., 2012) and mutant tumor cell lines generated. We 
monitored tumor growth following injection of 1 x 10 6 FAK-defi- 
cient cells (FAK _/_ ) or FAK-deficient cells that re-expressed 
wild-type FAK (FAK-WT) at comparable levels to endogenous 
FAK in both CD-I nude and FVB (syngeneic) host mouse strains. 
In CD-I nude mice, SCC FAK~'~ tumor growth was character- 
ized by a modest growth delay (Figure 1 A) as reported previously 
(Serrels et al., 2012). By contrast, in FVB mice, SCC FAK~'~ tu- 
mor growth was characterized by an initial period of growth in the 
first 7 days followed by complete regression by day 21 (Fig- 
ure 1 B). Thus, FAK expression is required for the survival and 
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growth of SCC tumors in FVB mice with 
a functional adaptive immune system. 

SCC FAK / ~ Tumor Regression Is 
Dependent on CD8 + T Cells 

To characterize the role of adaptive im- 
munity in FAK~'~ SCC tumor regression, 
we used antibody-mediated T cell deple- 
tion in animals bearing FAK~'~ tumors 
(Figures 1C and SI). Depletion of CD4 + T cells had no effect on 
tumor growth. In contrast, depletion of CD8 + T cells, either alone 
or in combination with CD4 + T cells, restored SCC FAK~'~ tumor 
growth. This implies that cytotoxic CD8 + T cells were respon- 
sible for regression of FAK~'~ tumors (Figure 1C) but does not 
exclude an accessory role for CD4 + T cells. T cell depletion in 
mice bearing SCC FAK-WT tumors (Figure ID) revealed that: 

(1) depletion of CD8 + T cells, either alone or in combination 
with CD4 + T cells, caused a significant increase in tumor growth 
when compared to isotype-treated controls at day 14, and 

(2) depletion of CD4 + T cells alone caused regression of FAK- 
WT SCC tumors by day 21 . This implied that FAK-expressing 
tumors were also under negative pressure from the immune sys- 
tem and that cells from the CD4 + T cell compartment play a role 
in protecting FAK-WT tumors from immune-mediated regression 
(reason discussed later; Figure 3). 

Next, we re-challenged mice with 1 x 10 6 SCC FAK-WT cells 
after regression of primary FAK~'~ SCC tumors, following 7 days 
of tumor-free survival after the tumors had regressed (Figure 1 E, 
top and middle graphs). Neither FAK-deficient nor FAK-express- 
ing SCC cells were able to grow after the mice had been pre- 
challenged with SCC FAK~'~ cells. As controls, SCC FAK-WT 
and FAK~'~ cells were injected at day 28 into mice with no 
pre-challenge, and these grew as expected (Figure 1 E, bottom). 
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This implies that, following FAK~'~ SCC tumor regression, host 
mice remain immunized against further tumor challenge 
because immunological memory had been established. It is 
possible that either broad immunization against SCCs may 
have occurred or, more likely, that the FAK~'~ and FAK-WT 
SCCs shared common antigen(s) that are expressed irrespec- 
tive of FAK status. We conclude that FAK enables SCC cancer 
cells to suppress an adaptive immune response rather than to 
circumvent it through evading recognition per se. SCC FAK~'~ 
cells in which a FAK kinase-deficient mutant was re-expressed 
(SCC FAK-KD) initially grew and then regressed with kinetics 
that were only modestly delayed when compared to FAK~'~ 
cells, indicating that immune suppression depends on FAK ki- 
nase activity (Figure 1 F). 

We next investigated the nature of the T cell response within 
tumors derived from all three SCC cell lines using FACS analysis 
on disaggregated tumor tissue taken at day 7. We did not 
observe a significant change in the percentage of total CD4 + 
T cells (Figures 2A and S2 and Table S2) or the percentage of 
CD4 + T cells that expressed the activation marker CD69 (Fig- 
ure 2B). In contrast, we did observe a significant increase in 
the proportion of effector CD4 + CD44 hl CD62L low T cells in SCC 
FAK~'~ and FAK-KD tumors when compared to FAK-WT tumors 
(Figures 2C and S2 and Table S2). Analysis of tumor-infiltrating 
CD8 + T cells revealed a significant increase in SCC FAK~'~ 
and SCC FAK-KD tumors when compared to SCC FAK-WT tu- 
mors (Figures 2D and S2 and Table S2), indicative of a height- 
ened cytotoxic anti-tumor immune response. Staining with the 
activation marker CD69 identified the presence of CD8 + CD69 + 
T cells in all tumors (Figure 2E). Further analysis revealed an in- 
crease in percentage of effector CD8 + CD44 hl CD62L low T cells 
in SCC FAK~'~ and SCC FAK-KD tumors when compared to 
SCC FAK-WT tumors (Figures 2F and S2 and Table S2), espe- 
cially when effector CD8 + T cell numbers were normalized to 
account for the observed changes in total CD8 + T cells and pre- 
sented as a “fold change” (Figure 2G). However, while SCC 
FAK~'~ and SCC FAK-KD tumors had increased effector CD8 + 
T cells, there were activated CD8 + T cells present in all of the 
SCC tumors, raising the question of why SCC FAK-WT tumors 
do not succumb to the cytotoxic CD8 + T cell response. 

It is now established that not only the quantity of tumor-infil- 
trating CD8 + T cells is important, but also their “quality.” 
Tumor-induced T cell exhaustion has been reported in a number 
of tumor types, including melanoma (Fourcade et al., 2010) and 
ovarian cancer (Matsuzaki et al., 2010), and is characterized by 
expression of co-inhibitory surface receptors, including pro- 
grammed death receptor 1 (PD-1), lymphocyte-activation gene 
3 (LAG-3), and T cell immunoglobulin mucin-3 (Tim-3), either 
alone or in combination (Fourcade et al., 2010; Sakuishi et al., 
2010; Wherry, 2011). Analysis of these markers on antigen- 
primed CD8 + CD44 hi T cells infiltrating SCC FAK-WT, FAK~'~, 
and FAK-KD tumors revealed increased surface expression of 
PD-1 , LAG-3, and Tim-3 in CD8 + CD44 hl T cells present in SCC 
FAK-WT tumors (Figures 2H-2J). Together, our data imply that 
antigen-primed CD8 + CD44 hi T cells infiltrating SCC FAK-WT tu- 
mors exhibit a heightened state of exhaustion indicative of a 
dysfunctional T cell response. Linked to their exhausted state, 
there was also evidence of decreased proliferation of CD8 + 



T cells isolated from SCC FAK-WT tumors (judged by Ki-67 
staining in Figure 2K). 

Histological staining of tumor sections taken at day 7 revealed 
that: (1) CD8 + T cells are present throughout all tumors, and 
(2) while CD8 + T cells infiltrating SCC FAK-WT tumors appear 
predominantly as individual cells, CD8 + T cells infiltrating SCC 
FAK~'~ and FAK-KD tumors are clustered (Figure 2L). Thus, 
the ability of SCC FAK-WT tumors to evade the anti-tumor im- 
mune response is not due to limited CD8 + T cell penetration 
into these tumors. 

FAK Expression Drives Establishment 
of an Immuno-Suppressive Environment 

Macrophages, myeloid-derived suppressor cells (MDSC), and 
Tregs with intrinsic immuno-suppressive capabilities can pro- 
mote tumor development by inhibiting cytotoxic CD8 + T cell ac- 
tivity in mouse and humans (Beyer and Schultze, 2006; Biragyn 
and Longo, 2012; Marigo et al., 2008). Flow cytometric analysis 
revealed no differences in macrophage or MDSC populations 
that correlated with tumor regression (Figures 3A, 3B, S3, and 
S4 and Table S2), although this does not rule out an accessory 
role for these cells in eventual tumor clearance. However, we 
did find a significantly greater number of CD4 + FoxP3 + CD25 + 
Tregs in SCC FAK-WT tumors (Figures 3C and S4 and Table 
S2) when compared with FAK~'~ and FAK-KD tumors (Fig- 
ure 3C). Tregs have been associated with the development of 
CD8 + T cell exhaustion (Sakuishi et al., 2013) and may therefore 
be linked to the CD8 + T cell exhaustion that we observed in SCC 
FAK-WT tumors (Figures 2H-2J). We next calculated the ratio of 
CD8 + T cells to Tregs (Figure 3D), as this has been reported to 
correlate with poor prognosis in a number of tumor types (Sato 
et al., 2005; Shah et al., 2011). We found a substantially lower 
CD8 + T cell to Treg ratio in SCC FAK-WT tumors when compared 
to SCC FAK~'~ and SCC FAK-KD tumors, which correlated with 
outcome in terms of tumor tolerance versus immune-mediated 
tumor regression. 

Tregs Protect FAK-WT Tumors from Immune-Mediated 
Regression 

We next examined SCC FAK-WT tumor growth in animals 
treated with an anti-CD25 antibody to deplete Tregs (Figure 3E). 
Depletion of CD25 + cells led to regression of SCC FAK-WT 
tumors. Therefore, FAK-dependent Tregs are required for the 
growth of FAK-WT-expressing tumors by creating an im- 
muno-suppressive environment that impairs cytotoxic CD8 + 
T cell activity. This role of CD4 + Tregs is the likely reason for ef- 
fects of the CD4-depleting antibody in promoting regression of 
SCC FAK-WT tumors (Figure ID). We note that high Treg levels 
have been reported in a number of solid tumor types (Beyer 
and Schultze, 2006) and that elevated Tregs are linked to 
poor clinical outcome (Beyer and Schultze, 2006; Sato et al., 
2005). 

We demonstrated that Tregs derived from SCC FAK-WT tu- 
mors expressed the transcription factor (TF) Helios (Figure S5A), 
indicative of thymic origin (Thornton et al., 2010). Thus, we hy- 
pothesized that FAK may drive the recruitment and expansion 
of the intra-tumoral Tregs by influencing the availability of 
secreted factors. 
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Figure 2. FAK-Depleted Tumors Exhibit a Heightened CD8 + T Cell Response 

(A) FACS quantification of total intra-tumoral CD4 + T cells. 

(B) FACS quantification of CD69 + cells as a percentage of CD4 + T cells. 

(C) FACS quantification of CD4 + CD44 hi CD62L low , CD4 + CD44 hi CD62L hi , CD4 + CD44 low CD62L low T cell subpopulations. 

(D) FACS quantification of total intra-tumoral CD8 + T cells. 

(E) FACS quantification of CD69 + cells as a percentage of CD8 + T cells. 

(F) Quantification of CD8 + CD44 hi CD62L low , CD8 + CD44 hi CD62L hi , CD8 + CD44 low CD62L low T cell subpopulations. 

(G) Changes in effector (CD8 + CD44 hl CD62L low ) CD8 + T cells normalized to total CD8 + T cell proportions. 

(H) FACS quantification of PD-1 + l_AG-3 + T cells as a percentage of CD8 + CD44 hl tumor-infiltrating T cells, n = 6 tumors. 

(I) FACS quantification of PD-1 + Tim-3 + T cells as a percentage of CD8 + CD44 hi tumor-infiltrating T cells, n = 3 tumors. 

(J) FACS quantification of PD-1 + Tim-3 + LAG-3 + T cells as a percentage of CD8 + CD44 hl tumor-infiltrating T cells, n = 3 tumors. 

(K) FACS quantification of Ki-67 + cells as a percentage of tumor-infiltrating CD8 + T cells, n = 3 tumors. 

(L) Representative histological staining of CD8 in frozen sections from SCC FAK-WT, SCC FAK~'~, and SCC FAK-KD tumors. Dashed white lines demark tumor 
boundary. 

Scale bars, 500 i^m. *p < 0.05, **p < 0.01 , ***p < 0.001 , ****p < 0.0001 ; ns, not significant; Tu key-corrected one-way ANOVA (C and F, CD44 hl CD62L low only). Data 
are represented as mean ± SEM; n = 5 tumors unless stated. 



FAK Regulates the Transcription of Chemokines and 
Cytokines to Control Tregs 

To address how FAK activity in SCC cancer cells promotes 
elevated intra-tumoral Tregs, we next analyzed global transcrip- 



tional profiles of SCC FAK-WT and SCC FAK~'~ cells using Affy- 
metrix GeneChip microarrays (Figure 4A). FAK expression re- 
sulted in the upregulation of 498 genes and the downregulation 
of 598 genes (p < 0.01). The upregulated transcript set in SCC 
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Figure 3. FAK Regulates the Immuno-Sup- 
pressive Microenvironment 

(A) FACS quantification of Ly6C hl and Ly6C low 
macrophage populations expressed as a per- 
centage of tumor-infiltrating CD45 + leukocytes. 

(B) FACS quantification of Ly6C hi Gr1 low (M-MDSC) 
and Ly6C int Gr1 hi (G-MDSC) populations ex- 
pressed as a percentage of tumor-infiltrating 
CD45 + leukocytes. 

(C) Quantification of CD4 + CD25 + FoxP3 + Tregs 
expressed as a percentage of tumor-infiltrating 
CD4 + T cells. 

(D) CD8 + T cell-to-Treg ratio calculated using 
mean values from Figures 2D and 3C. 

(E) SCC FAK-WT tumor growth in FVB mice 
treated with anti-CD25 depleting antibody. 

n = 6 tumors. * or + p < 0.05, ++ p < 0.01, ***p < 
0.001, **** or ++++ p < 0.0001; Tu key-corrected 
one-way ANOVA (A, *, Ly6C hi ; + , Ly6C low )- Data are 
represented as mean ± SEM; n = 5 tumors unless 
stated. 



FAK-WT cells was associated with a number of processes, 
including cell migration, receptor binding, secretion, wounding, 
and ovulation (Figure 4B, top). Analysis of this gene set revealed 
the chemokine ligand group of genes to be significantly overrep- 
resented (Figure 4B, bottom), which is interesting given that a 
number of these chemokines and cytokines mediate both Treg 
recruitment to tumors and induction of peripheral Tregs within 
tumors (Goldstein et al., 2013; Ondondo et al., 2013). 

To establish which chemokines and cytokines were regulated 
by FAK and to address whether the FAK-dependent transcrip- 
tional profile was linked to chemokine receptor expression on tu- 
mor-infiltrating Tregs, we performed quantitative (q)RT-PCR 
array analysis. Comparison of chemokine/cytokine transcript 
levels between SCC FAK-WT and SCC FAK~'~ cells revealed a 
subset of ligands increased >2-fold in SCC FAK-WT cells (Fig- 
ure 4C). Several of these (Cell, Cc/5, Cc/7, CxcllO) have roles 
in Treg recruitment (Ondondo et al., 2013) (green arrowheads, 
Figure 4C), while one (Tgfb2) has a reported role in peripheral in- 
duction and expansion of Tregs (Goldstein et al., 2013) (red 
arrowhead, Figure 4C). To complement this, comparison of 
Tregs isolated from the thymus of normal FVB mice with those 
isolated directly from SCC FAK-WT tumors revealed a chemo- 
kine receptor switch (Figure 4D). We found increased expression 
of the cognate receptors for five of the six chemokine ligands up- 
regulated in SCC FAK-WT cells (Figure 4C). These receptor 
changes may represent a switch from lymphoid homing recep- 
tors, including Ccr7 and Cxcr4, toward expression of memory/ 
effector-type chemokine receptors, including Ccr2, Ccr5, Ccr8, 
and Cxcr6, involved in recruitment to non-lymphoid tissues 
and sites of inflammation. Network analysis of the relationship 
between FAK-dependent chemokine ligand expression in SCC 
cells and tumor-infiltrating Treg chemokine receptor expression 
revealed the existence of a FAK-dependent paracrine signaling 
axis between cancer cells and intra-tumoral Tregs based on che- 



mokine ligand-receptor interactions (Figure 4E). Furthermore, (q) 
RT-PCR analysis of Cc/5, CxcllO, and Tgfb2 demonstrated that 
their expression was dependent on FAK kinase activity (Fig- 
ure 4F). We note that disruption of the Ccl5/Ccr5 axis in a model 
of pancreatic adenocarcinoma results in reduced intra-tumoral 
Tregs and slows tumor growth (Tan et al., 2009), implying that 
FAK-dependent regulation of this paracrine signaling axis may 
be more generally important. Thus, FAK activity regulates the 
expression of a subset of chemokines that can specifically 
mediate crosstalk between tumor cells and tumor-infiltrating 
Tregs. This likely has importance in recruitment and retention 
of CD4 + FoxP3 + CD25 + Tregs into SCC FAK-WT tumors. 

Nuclear FAK Regulates the Transcription of Ccl5 
and TGF|32 to Increase Tregs 

The finding that the Tregs enriched in SCC FAK-WT tumors 
were likely recruited into SCC FAK-WT tumors led us to 
consider a potential role for Ccl5 that has been implicated in 
the recruitment and expansion of CD4 + FoxP3 + CD25 + Tregs 
(Tan et al., 2009), via the paracrine signaling axis that we iden- 
tified. We found that efficient knockdown of Cc/5 using two 
independent shRNA hairpins (PI and P2, Figure 5A) resulted 
in SCC FAK-WT shRNA-Ccl5 tumor regression by days 21-27 
(Figure 5B). We measured the absolute number of Tregs in 
SCC FAK-WT shRNA-Ccl5 tumors at day 7 and found that there 
was a substantial reduction in both Ccl5-depleted tumors when 
compared with empty vector control SCC FAK-WT pLKO tu- 
mors (Figure 5C). 

Expanding on these findings, shRNA-mediated knockdown of 
Tgfb2 expression in SCC FAK-WT cells also influenced tumor 
growth (Figures S5B and S5C). Partial knockdown of TGFp2 
had complex effects, which resulted in one of two outcomes. 
One group (Figure S5C, dashed blue line), grew more rapidly 
and ulcerated, leading to removal from study at day 14. In the 
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Figure 4. FAK Regulates Transcription of Cytokines Implicated in Treg Recruitment and Expansion 

(A) Transcriptomic profiling of SCC FAK-WT and SCC FAK~'~ cells. 

(B) Functional enrichment analysis of genes upregulated in SCC FAK-WT cells (bottom gray bar in A). Overrepresented biological processes are displayed as a 
heatmap (log 10 -transformed color scale) (top); asterisks indicate presence of cytokine-related genes. Overrepresented gene families are displayed as a bar chart 
(bottom), p < 0.05; Benjamini-Hochberg-corrected hypergeometric tests. 

(C) qRT-PCR array analysis of cytokine and chemokine expression in SCC FAK-WT and SCC FAK cells. Gray bar indicates cluster of genes upregulated in SCC 
FAK-WT cells; cytokine and chemokine gene names are listed. Green arrowheads indicate reported roles in Treg recruitment; red arrowhead indicates reported 
role in peripheral Treg induction. 

(D) qRT-PCR array analysis of chemokine and receptor expression in tumor- and thymus-derived Tregs. Gray bar indicates cluster of genes upregulated in tumor- 
derived Tregs; receptor gene names are listed. 

(E) Interaction network analysis of chemokine ligand gene expression detected in SCC cells (circles, left) and corresponding receptor gene expression detected in 
Tregs (squares, right). Genes are ordered vertically by fold change. Light gray lines connect receptor-ligand pairs; green lines indicate pairs upregulated at least 
2-fold in SCC FAK-WT cells and tumor-derived Tregs. 

(F) qRT-PCR analysis of selected cytokine and chemokine gene expression in SCC cells. ***p < 0.001 , ****p < 0.0001 ; Tukey-corrected one-way ANOVA. Data are 
represented as mean ± SEM. 
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Figure 5. Nuclear FAK Regulates Transcription of Ccl5, which Is Required for Treg Recruitment and Tumor Growth 

(A) qRT-PCR analysis of Cc/5 gene expression knockdown in SCC FAK-WT cells stably expressing two independent shRNA constructs targeting Ccl5 (PI 
and P2). 

(B) SCC FAK-WT shRNA-Ccl5 tumor growth in FVB mice, n = 6 tumors. 

(C) FACS quantitation of tumor-infiltrating Treg numbers from SCC FAK-WT shRNA-Ccl5 tumors. Data represent a single value from six pooled tumors. 

(D) Western blotting of cytoplasmic, nuclear, and total protein fractions from SCC FAK-WT, SCC FAK~'~, and SCC FAK-NLS cells. 

(E) qRT-PCR analysis of Cc/5 gene expression in SCC FAK-NLS cells. 

(F) Tumor growth of SCC FAK-NLS cells in FVB mice. 

(G) Western blotting of cytoplasmic, nuclear, and total protein fractions from SCC FAK-WT, SCC FAK /_ , and SCC FAK-KD cells. 

(H) Western blotting of whole-cell (WC) and nuclear (Nuc) protein fractions from SCC FAK-WT cells and primary skin keratinocytes. 60 s exposure time is shown 
for all samples; additional 10 min exposure time is shown for FAK in keratinocyte samples. GAPDH, cytoplasmic; PARP, nuclear. 

***p < 0.001 , ****p < 0.0001 ; Tu key-corrected one-way ANOVA. Data are represented as mean ± SEM unless stated. 



other group that did not display such frank ulceration, we 
observed tumor regression by day 27 (Figure S5C, dashed red 
line). Analysis of Treg levels in SCC FAK-WT shRNA-TGFp2 tu- 
mors at day 7 (regardless of initial growth characteristics) re- 
vealed that TGFp2 knockdown was also associated with a 
reduction in CD4 + FoxP3 + CD25 + Tregs (Figure S5D). Therefore, 



while the effects of reducing TGFp2 expression are more compli- 
cated than for Ccl5, FAK-dependent TGFp2 expression does 
contribute to elevated CD4 + FoxP3 + CD25 + Tregs in SCC FAK- 
WT tumors; and in the subset of mice bearing tumors that 
were able to complete the study, TGFp2 knockdown also caused 
tumor regression. 
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Our findings that FAK regulated the transcription of cytokines 
and chemokines (including Ccl5 and TGF(32) that were associ- 
ated with elevated intra-tumoral Tregs and tumor tolerance led 
us to consider a possible role for nuclear FAK in regulating the 
transcription of these genes. Based on previous reports (Lim 
et al., 2008), which identified putative NLSs within the FERM 
domain of FAK, we constructed an optimally nuclear targeting- 
impaired mutant FAK by replacing two arginines (positions 177 
and 178) and four lysines (positions 190, 191, 216, and 218) 
with alanines (termed FAK-NLS). Western blotting of cyto- 
plasmic and nuclear fractions confirmed that the FAK-NLS 
mutant was indeed defective in nuclear localization (Figure 5D). 
Subsequent (q)RT-PCR analysis of Cc/5 and Tgfb2 expression 
in SCC cells expressing only FAK-NLS revealed that FAK nuclear 
localization was required for transcription of these genes (Fig- 
ures 5E and S5E, respectively). Thus, nuclear FAK drives the 
transcription of Ccl5 and TGF(32, which are required for recruit- 
ment and expansion of immuno-suppressive Tregs into SCC tu- 
mors, altering the balance between CD8 + T cells and Tregs in 
favor of tumor tolerance. In support of this, growth of SCC 
FAK-NLS tumor cells was similar to that of SCC FAK ~'~ , with ul- 
timate tumor regression (Figure 5F). This confirmed that it was 
nuclear FAK that afforded protection from the anti-tumor im- 
mune response. Western blotting of cytoplasmic and nuclear 
fractions from SCC FAK-KD showed that the kinase-deficient 
mutant was able to localize to the nucleus, so we conclude 
that the immune modulatory effects of FAK are dependent on 
FAK kinase activity in the nucleus (Figure 5G). 

We next examined nuclear FAK levels in primary skin keratino- 
cytes, the normal cellular counterparts of the SCC cells used 
here, and did not find detectable nuclear FAK (Figure 5H). 
Thus, abundant nuclear localization, and therefore the capacity 
to exert regulatory control over chemokine and cytokine expres- 
sion, is likely a feature of oncogenic transformation in skin kera- 
tinocytes. This suggests that the nuclear functions of FAK that 
we have identified — namely, regulating transcription of chemo- 
kine/cytokine networks— may be associated with the cancerous 
state when FAK is highly expressed. 

Nuclear FAK Interacts with a Network of Ccl5 
Transcriptional Regulators 

Having established an important role for the nuclear FAK- 
dependent transcription of Ccl5 in mediating recruitment and 
expansion of intra-tumoral Tregs, we wanted to determine 
how nuclear FAK could exert control over Ccl5 transcription. 
Using sucrose gradients, we fractionated the nuclei of SCC 
FAK-WT cells and demonstrated that nuclear FAK was present 
in the chromatin-containing fraction (Figure 6A). Transcriptional 
regulation of Cc/5 is mediated predominantly through six short 
regulatory elements contained within a region of the Cc/5 pro- 
moter spanning ~300 base pairs (Fessele et al., 2002). These 
regulatory elements contain binding sites for a number of 
TFs, including AP-1 , C/EBP, IRF-1 , NF-kB, and TATA box-bind- 
ing protein (TBP), which is part of the transcription factor IID 
complex (TFIID). Using FAK immunoprecipitation and quantita- 
tive label-free mass spectrometry, we identified FAK binding 
partners in purified nuclear extracts and contextualized these 
by mapping onto a network of proteins associated with pre- 



dicted Ccl5 TFs (constructed in silico; Figure 6B). This integra- 
tive approach identified a subset of Ccl5 TFs and regulators of 
these that interact with FAK in SCC cell nuclei (Figures 6C, S6 
and Table SI). Interaction network analysis of this protein sub- 
set revealed nuclear FAK binding partners with roles in multiple 
transcriptional pathways, including regulators of AP-1 , C/EBP, 
IRF-1/-7, NF-icB/Rel, and TFIID. Thus, we identified nuclear 
FAK binding partners that can interact, directly or indirectly, 
with five of the six main regulatory elements reported to control 
transcription of Ccl5 in multiple cell types (Fessele et al., 2002). 
Given that our interaction network was somewhat dominated 
by proteins associated with the TFIID pathway, including three 
TBP-associated factors (TAFs) (Figures 6C and S6), we used 
co-immunoprecipitation to confirm the interaction of nuclear 
FAK with one of these, TAF9, a core component of the TFIID 
complex (D’Alessio et al., 2009) (Figure 6D). Our data show 
that FAK binds to core components of the transcriptional ma- 
chinery, many of which are known to be located on the pro- 
moter of genes undergoing active transcription and that are 
known or predicted to regulate Ccl5. Therefore, in SCC cells, 
nuclear FAK associates with chromatin and is physically linked 
to a network of TFs and their regulators known to modulate 
Ccl5 expression. 

Small-Molecule FAK Kinase Inhibitor Promotes 
Immune-Mediated Tumor Clearance 

Therapeutic targeting of FAK kinase activity using small-mole- 
cule inhibitors will inhibit FAK signaling not only in tumor cells, 
but also potentially in multiple host cell types. To complement 
expression of the FAK-KD mutant protein in the cancer cells 
and investigate whether a FAK inhibitor could induce immune- 
mediated regression of SCC tumors, we used the FAK/Pyk2 ki- 
nase inhibitor VS-4718 (Shapiro et al., 2014), which is currently 
in clinical development. Mice were treated with VS-4718 at 
75 mg/kg for 24 hr prior to injection of 1 x 10 6 FAK-WT or 
FAK~'~ SCC tumor cells and twice daily thereafter. This resulted 
in VS-4718-induced regression of SCC FAK-WT tumors by day 
24 (Figure 7A). Following cessation of VS-4718 treatment, no 
tumor regrowth was observed (data not shown). SCC FAK~'~ tu- 
mor growth and clearance was not greatly affected by VS-4718 
treatment, suggesting that the anti-tumor effects of VS-4718 can 
be explained by FAK inhibition in tumor cells. Activity of VS-4718 
was confirmed using an ELISA to measure FAK autophosphory- 
lation on tyrosine-397 in tumor lysates from mice treated with 
75 mg/kg VS-4718 (Figure S7). Regression of VS-4718-treated 
SCC tumors was not accompanied by loss of cell viability at 
day 7, as measured by FACS using a viability stain following tu- 
mor disaggregation (Figure 7B). There was a significant but small 
increase in leukocytes in VS-4718-treated SCC FAK-WT tumors 
(Figure 7C) and a significant increase in total CD4 + T cells (Fig- 
ures 7D and S2 and Table S2) and effector CD4 + CD44 hi CD62L low 
T cells (Figures 7E and S2 and Table S2). A significant increase in 
CD8 + T cells was also evident in SCC FAK-WT VS-4718-treated 
tumors (Figures 7F and S2 and Table S2), although there was no 
change in effector CD8 + CD44 hl CD62L low T cells (Figures 7G and 
S2 and Table S2). Crucially, there was a significant reduction in 
CD4 + CD25 + FoxP3 + Treg cells in VS-4718-treated SCC FAK- 
WT tumors, which was similar to that observed in vehicle and 
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Figure 6. Nuclear FAK Interacts with Regulators of Ccl5 Transcription 

(A) Sucrose fractionation of soluble chromatin prepared from SCC FAK-WT cell nuclei. Protein preparations recovered from each fraction were analyzed by 
western blotting (top). DNA recovered from each fraction was analyzed by agarose gel electrophoresis (bottom, 1 kilobase [kb] and 100 base pair [bp] ladders 
shown). Fraction 7 (black arrowhead) represents the chromatin-containing fraction. 

(B) Schematic detailing the workflow used for proteomic analysis of the nuclear FAK interactome in the context of Cc/5 transcription factors (TFs). 

(C) Interaction network analysis of proteins that bind FAK in the nucleus of SCC cells. Predicted Cc/5 TFs (squares, bottom) and respective TF binders (circles, top) 
enriched by at least 4-fold in nuclear FAK immunoprecipitations (SCC FAK-WT over SCC FAK~'~ controls; p < 0.05) are shown (stringent network). Cc/5 TFs not 
detected (ND) are shown as gray squares. TF complexes or groups are indicated; proteins are labeled with gene names for clarity. TF binders are aligned above TF 
groups with which there are the greatest number of reported interactions. For full network, see Figure S6; for protein interaction list, see Table SI . 

(D) Isolation of the TFIID component TAF9 by FAK immunoprecipitation (IP) from SCC FAK-WT cell nuclear extracts. 



VS-4718-treated SCC FAK ' tumors (Figures 7H and S4 and 

Table S2). 

Thus, VS-4718 promoted robust anti-tumor activity, with 
similar immune cell changes to that observed upon FAK deletion 
or expression of a kinase-deficient form of FAK. Furthermore, 
anti-tumor efficacy of VS-4718 was also dependent on CD8 + 
T cells, and SCC FAK-WT tumors treated with VS-4718 on a 
CD8 + T cell-depleted background exhibited a growth delay but 



did not undergo tumor regression (Figure 71). We conclude that 
the FAK kinase inhibitor targets mechanisms of immune 
suppression and may therefore represent a form of effective “im- 
muno-modulatory” therapy that reduces Tregs in the tumor envi- 
ronment. Importantly, the FAK kinase inhibitor does not affect 
the cytotoxic function of antigen-primed CD8 + T cells. We also 
found that VS-4718 treatment that was initiated 5 days post- 
inoculation of 1 x 10 6 SCC FAK-WT cells, when these had 
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Figure 7. The FAK Kinase Inhibitor VS-4718 Leads to Immune-Mediated SCC Clearance 

(A) SCC FAK-WT and SCC FAK~'~ tumor growth in FVB mice treated with either vehicle or VS-4718. Treatment started 24 hr pre-tumor cell inoculation and 
continued for the duration of the experiment. 

(B) FACS analysis of cell viability from disaggregated tumors treated with either vehicle or VS-4718. 

(C) FACS analysis of vehicle- or VS-4718-treated tumor-infiltrating leukocytes expressed as a percentage of viable CD45 + cells relative to the total number of 
single cells. 

(D) FACS analysis of tumor-infiltrating CD4 + T cells from vehicle- or VS-4718-treated tumors. 

(E) FACS sub-categorization of tumor-infiltrating CD4 + T cells into CD45 + CD3 + CD4 + CD8 CD44 hi CD62L low , CD45 + CD3 + CD4 + CD8“CD44 hi CD62L hi , and 
CD45 + CD3 + CD4 + CD8 CD44 low CD62L low populations. 

(F) FACS analysis of tumor-infiltrating CD8 + T cells from vehicle- or VS-4718-treated tumors. 

(G) FACS sub-categorization of tumor-infiltrating CD8 + T cells into CD45 + CD3 + CD4“CD8 + CD44 hi CD62L low , CD45 + CD3 + CD4“CD8 + CD44 hi CD62L hi , and 
CD45 + CD3 + CD4~CD8 + CD44 low CD62L low populations. 

(H) FACS analysis of tumor-infiltrating CD4 + CD25 + FoxP3 + Tregs expressed as a percentage of tumor-infiltrating CD4 + T cells. 

(I) SCC FAK-WT tumor growth in FVB mice treated with either vehicle or VS-4718 and either isotype control or CD8-depleting antibodies. 

(J) SCC FAK-WT and SCC FAK~'~ tumor growth in FVB mice treated with either vehicle or VS-4718. Treatment started 5 days post-tumor cell inoculation (gray 
dashed line) and continued for the duration of the experiment. 

*p < 0.05, **p < 0.01 , ***p < 0.001 , ****p < 0.0001 ; ns, not significant; Tu key-corrected one-way ANOVA (E and G, CD44 hl CD62L low only). Data are represented as 
mean ± SEM; n = 6 tumors. 
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already formed palpable tumors of ~50 mm 3 , led to complete tu- 
mor regression (Figure 7J). 

DISCUSSION 

We show that nuclear FAK in SCC cancer cells drives the tran- 
scription of chemokines and cytokines, including Ccl5 and 
TGF(32, which promote the formation of an immuno-suppressive, 
pro-tumorigenic microenvironment. This is dependent on FAK 
kinase activity, and expression of a catalytically inactive mutant 
FAK protein (FAK-KD) or treatment with a small-molecule inhib- 
itor causes tumor regression. This is effective even when tumors 
are already established, raising the exciting possibility that tar- 
geting of FAK kinase activity may have immune-mediated anti- 
tumor efficacy in patients. We established that nuclear FAK is 
associated with chromatin and interacts with a number of TFs 
and transcriptional regulators, including components of the 
TFIID complex, that are linked to regulation of Ccl5 expression. 
Our data imply that FAK interacts with core transcriptional 
machinery to influence gene transcription and promote tumor 
immune escape. 

Historically, FAK has been recognized as an adhesion-related 
non-receptor protein tyrosine kinase that clusters at focal adhe- 
sion (FA) structures and regulates cancer-associated processes, 
including adhesion, migration, invasion, survival, and prolifera- 
tion (reviewed in Frame et al., 2010). FAK was also found to 
translocate to the nucleus (Lim et al., 2008; Luo et al., 2009b), 
leading to the idea of nuclear functions for FAK within the nu- 
cleus. Our data show that, at least in cancer cells, FAK regulates 
inflammatory transcriptional programs associated with genera- 
tion and maintenance of a pro-tumorigenic and immuno-sup- 
pressive microenvironment. FAK associates with chromatin, 
and in the context of Ccl5 expression, it interacts with a number 
of TFs, and regulators of TFs, that bind regulatory elements in the 
Ccl5 promoter (Fessele et al., 2002). Our data imply that FAK ex- 
ists in complexes with a number of TAF proteins, including TAF9 
and TAF12, key components of the core promoter complex 
TFIID that serves to initiate transcription by driving recruitment 
of chromatin remodeling complexes, coactivators, and RNA po- 
lymerase II to the promoter (D’Alessio et al., 2009). Therefore, 
FAK interacts with components of the core transcriptional 
machinery in order to drive transcription of chemokines and cy- 
tokines that contribute to recruitment of Tregs into the tumor 
environment, promoting immunological tolerance and permitting 
tumor growth. 

Recently, nuclear accumulation of active FAK (phosphory- 
lated on Tyr-397) within tumor cells of patients with colorectal 
cancer was reported to correlate with poor prognosis (Albasri 
et al., 2014), highlighting the need to understand the nature of 
FAK’s role within the nucleus. Studies using endothelial cells, 
muscle cells, and fibroblasts have previously reported low 
steady-state levels of nuclear FAK that are substantially 
increased in response to cellular stress (Lim, 2013; Lim et al., 
2008; Luo et al., 2009b). Our work implies that oncogenic stress 
is another route to inducing high levels of nuclear FAK and that 
this, in turn, can influence transcriptional programs, such as the 
chemokine and cytokine networks that control the tumor 
microenvironment. 



A number of therapeutic strategies targeting components of 
the immuno-suppressive tumor microenvironment are currently 
being tested, with the aim of restoring anti-tumor immunity by 
releasing the break on CD8 + T cell cytotoxic activity. In pre-clin- 
ical models of cancer, targeting Tregs (AN et al., 201 4; Bos et al., 
2013) has shown anti-tumor efficacy, either alone or when used 
in combination with agents that enhance CD8 + T cell activation. 
A clinical study combining agents targeting cytotoxic-T-lympho- 
cyte-associated antigen 4 (CTLA-4), which is thought to influ- 
ence Treg function (Peggs et al., 2009; Quezada et al., 2006; 
Simpson et al., 2013; Wing et al., 2008), and PD-1 , which blocks 
signals that inhibit T cell function, has reported impressive re- 
sponses in patients with advanced melanoma (Wolchok et al., 
2013). However, this combination of checkpoint blockade anti- 
bodies elicits substantial side effects in >50% of patients, high- 
lighting the need to find alternative combinations with improved 
tolerability. We have shown that targeting FAK kinase activity has 
the potential to modulate intra-tumoral Treg levels, resulting in 
robust CD8 + T cell anti-tumor immunity, while others have re- 
ported previously that FAK kinase inhibitors block monocyte/ 
macrophage and cancer-associated fibroblast recruitment into 
tumors by virtue of FAK’s role in regulating their migration 
(Stokes et al., 2011). Taken together, these findings suggest 
that targeting the pleiotropic cellular functions of FAK may 
have a broad impact on the immuno-suppressive tumor micro- 
environment, differentiating these agents from many therapeutic 
approaches that target single immune cell populations. 

Targeting a molecular pathway that is upregulated in cancer 
cells may provide tumor specificity and help to overcome 
some of the potential issues with severe autoimmunity when 
modulating immune cell populations. FAK inhibitors, such as 
VS-4718, are in clinical development. VS-4718 is currently in a 
phase I dose escalation clinical trial in patients with solid tumors 
(www.clinicaltrials.gov NCT01 849744). Our findings provide 
good rationale for pre-clinical and clinical testing of FAK kinase 
inhibitors alongside agents that stimulate CD8 + T cell activity, 
such as the checkpoint blockade therapies that target PD-1 
and CTLA-4, which are both in clinical development (Pardoll, 
2012 ). 

EXPERIMENTAL PROCEDURES 

Experiments involving animals were carried out in accordance with the 
UKCCCR guidelines by approved protocol (HO PL 60/4248). Brief experi- 
mental procedures are listed here. For details, please see the Supplemental 
Experimental Procedures. 

Generation of FAK Nuclear Localization Mutant 

Mutations were introduced into FAK-WT at R177A, R178A, K190A, K191A, 
K216A, and K218A using PCR-based site-directed mutagenesis. 

Cell Lines 

Isolation and generation of the FAK SCC cell model is described in Serrels et al. 
(2012). Keratinocyte cultures were prepared as detailed in McLean etal. (2004). 

Western Blot Analysis 

To prepare whole-cell lysates, cells were washed in cold PBS and lysed in 
RIPA buffer. Cytoplasmic and nuclear extracts were prepared as described 
in Lim et al. (2008). Lysates were resolved by gel electrophoresis, transferred 
to nitrocellulose, and probed with respective antibodies. 
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Subcutaneous Tumor Growth 

Cells were injected into both flanks of either CD-I nude mice or FVB mice and 
tumor growth measured twice-weekly. Animals were sacrificed when tumors 
reached maximum allowed size or when signs of ulceration were evident. 
For treatment with VS-471 8, drug was prepared in 0.5% carboxymethyl cellu- 
lose + 0.1 % Tween 80 and mice treated at 75 mg/kg BID by gavage. No signs 
of toxicity were observed. 

Tumor Growth following Re-Challenge 

SCC FAK~'~ cells were injected into the left flank of FVB mice. Following tumor 
regression, mice were left for 7 days before being challenged with SCC FAK- 
WT or FAK~'~ cells injected into the right flank. Tumor growth was measured 
twice-weekly. Control groups were injected into both flanks at day 28 using 
mice that had not been pre-challenged with SCC FAK~'~ cells. 

CD4 + , CD8 + , and CD25 + T Cell Depletion 

T cell depletion was achieved following IP injection of 1 50 ^g of depleting anti- 
body into female age-matched FVB mice for 3 consecutive days and was 
maintained by further IP injection at 3 day intervals until the study was termi- 
nated. SCC FAK-WT or FAK~' cells were injected into both flanks 6 days after 
initial antibody treatment and tumor growth measured. The extent of T cell 
depletion was determined at the end of the study using FACS (Figure SI). 

FACS Analysis of Immune Cell Populations 

Tumors established following injection of SCC cells into both flanks of an FVB 
mouse were removed at day 7. Tumor tissue was processed to obtain single 
cell suspension for staining and subsequent FACS analysis (antibodies listed 

in Table S2). 

Gene Expression Profiling 

RNA was analyzed using the GeneChip Mouse Genome 430 2.0 Array. 
Normalized data for differentially expressed genes were median centered 
and clustered using Cluster 3.0 and Java TreeView. Functional enrichment 
analysis was performed using ToppGene. 

Quantitative RT 2 -PCR Array Analysis of Cytokine, Chemokine, 
and Chemokine Receptor Expression 

RNA prepared from SCC cells was analyzed using the mouse cytokine and 
chemokine RT 2 Profiler PCR Array and that from isolated Tregs was analyzed 
using the mouse chemokine and receptor array. Relative gene expression 
( 2 -Act) va | ues were | 0 g transformed, median centered, and subjected to hier- 
archical clustering as for microarray analysis. An interactome of chemokine 
ligands and receptors was constructed using the IUPHAR/BPS Guide to Phar- 
macology database and curated from the literature, onto which expression 
data for detected genes were mapped and visualized using Cytoscape. 
Expression of selected cytokine and chemokine genes was assessed by stan- 
dard quantitative RT-PCR. 

shRNA-Mediated TGF02 and Ccl5 Knockdown 

Cells were subject to two rounds of lentiviral infection prior to selection with 
puromycin. shRNA constructs used were part of the pLKO lentiviral TRC 
library. 

Preparation and Fractionation of Nuclei and Chromatin 

Nuclei were prepared as described (Gilbert et al., 2003) but with a reduced 
concentration (0.05%) of NP-40 in nuclei buffer B. Soluble chromatin was pre- 
pared as described (Gilbert et al., 2004) and fractionated on a sucrose step 
gradient to separate soluble and chromatin-associated nuclear proteins. 
DNA was recovered from fractions and subjected to agarose gel electropho- 
resis. Protein was purified using TCA precipitation. Samples were analyzed 
by SDS-PAGE and blotted using anti FAK, HPIa, and histone H3 antibodies. 

Proteomic Analysis of Nuclear FAK Protein Complexes 

FAK nuclear protein complexes were subjected to on-bead proteolytic diges- 
tion, desalting, and liquid chromatography-tandem mass spectrometry, as 
described (Turriziani et al., 2014). For interaction network analysis, Ccl5 tran- 
scription factors were extracted from the DECODE database and used to seed 



a network of 1 ,000 transcription factor-related proteins using the GeneMANIA 
plugin in Cytoscape. Proteins specifically isolated in nuclear FAK protein com- 
plexes were mapped onto the interactome, and those with physical or pre- 
dicted direct or indirect interactions with Ccl5 transcription factors were 
analyzed using the NetworkAnalyzer plugin in Cytoscape. 

CD8 T Cell Fluorescent Immunohistochemistry 

Tumors were removed 7 days post-implantation and frozen by submersing in 
liquid nitrogen. Tumor sections were cut, processed and stained. They were 
imaged using an Olympus FV1000 confocal microscope. 
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SUMMARY 

Triple-negative breast cancer (TNBC) is a highly 
aggressive form of breast cancer that exhibits ex- 
tremely high levels of genetic complexity and yet a 
relatively uniform transcriptional program. We postu- 
late that TNBC might be highly dependent on uninter- 
rupted transcription of a key set of genes within this 
gene expression program and might therefore be 
exceptionally sensitive to inhibitors of transcription. 
Utilizing kinase inhibitors and CRISPR/Cas9-medi- 
ated gene editing, we show here that triple-negative 
but not hormone receptor-positive breast cancer cells 
are exceptionally dependent on CDK7, a transcrip- 
tional cyclin-dependent kinase. TNBC cells are unique 
in their dependence on this transcriptional CDK and 
suffer apoptotic cell death upon CDK7 inhibition. An 
“Achilles cluster” of TNBC-specific genes is espe- 
cially sensitive to CDK7 inhibition and frequently asso- 
ciated with super-enhancers. We conclude that CDK7 
mediates transcriptional addiction to a vital cluster of 
genes in TNBC and CDK7 inhibition may be a useful 
therapy for this challenging cancer. 

INTRODUCTION 

Recent advances in genomic sequencing have led to an unprec- 
edented understanding of the genetics of tumor heterogeneity 
(Fisher et al., 2013). For a number of cancers, this has led to 
the discovery of “driver” oncogenes such as mutant BRAF, 
EGFR, and EML4-ALK, which has informed rational drug devel- 
opment strategies (Chinet al., 201 1). For other tumors, however, 
sequencing has only revealed a striking level of heterogeneity 
and has not resulted in the identification of clear driver mutations 
(Cancer Genome Atlas Research Network, 201 1 , 2012). Despite 
this genetic heterogeneity, a number of these tumors can be 
readily identified based upon their gene expression programs 
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(Hoadley et al., 201 4). We hypothesized that, despite the genetic 
heterogeneity, maintenance of these uniform gene expression 
programs might require continuous active transcription and 
therefore be more sensitive to drugs that target transcription. 

We evaluated this hypothesis in the context of triple-negative 
breast cancer (TNBC) because this subtype is characterized by 
high genetic complexity (Abramson et al., 201 5; Cancer Genome 
Atlas Network, 2012) and has a characteristic gene expression 
program (Parker et al., 2009; Perou et al., 2000). Compared to 
hormone receptor (estrogen and/or progesterone receptor)-pos- 
itive (ER/PR+) breast cancer, TNBC demonstrates a higher level 
of genetic complexity, as indicated by a higher rate of point mu- 
tation, gene amplification, and deletion (Cancer Genome Atlas 
Network, 2012). Notably, TNBC lacks a common genetic alter- 
ation except mutations of tumor suppressor genes such as 
INPP4B, PTEN, and TP53 (Abramson et al., 2015; Andre et al., 
2009; Cancer Genome Atlas Network, 2012; Gewinner et al., 
2009; Shah et al., 201 2), a situation that has limited the develop- 
ment of “targeted” therapies. The highly aggressive nature of 
TNBC and the lack of effective therapeutics make this disease 
a high priority for discovery biology efforts. 

Targeting gene transcription for cancer therapy has long been 
considered difficult, due to a presumably universal role of tran- 
scription in non-malignant cells or tissues, and consequently, 
pharmacologic inhibition of general transcriptional machinery 
might lack selectivity for cancer cells and cause intolerable 
toxicity. Recent studies, however, have challenged this para- 
digm and found that transcription of certain genes is dispropor- 
tionately sensitive to inhibition of transcription (Dawson et al., 
2011; Delmore et al., 2011; Chapuy et al.; 2013; Chipumuro 
et al., 2014; Christensen et al., 2014; Kwiatkowski et al., 2014; 
Zuber et al., 2011). Those genes, often encoding oncogenic 
drivers with short mRNA and protein half-lives (e.g., MYC, 
MYCN, and RUNX1), have a striking dependence on continuous 
active transcription, thereby allowing for highly selective effects 
before “global” downregulation of transcription is achieved. 
The continuous active transcription of these genes in cancer 
cells is often driven by exceptionally large clustered enhancer 
regions, called super-enhancers, that are densely occupied by 
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transcription factors and co-factors (Hnisz et al., 2013, 2015; 
Loven et al., 2013). 

The control of gene transcription involves a set of cyclin-depen- 
dent kinases (CDKs), including CDK7, CDK8, CDK9, CDK1 2, and 
CDK13, that play essential roles in transcription initiation 
and elongation by phosphorylating RNA polymerase II (RNAPII) 
and other components of the transcription apparatus (Akhtar 
et al., 2009; Larochelle et al., 2012; Zhou et al., 2012). We recently 
discovered a selective CDK7 inhibitor, THZ1, that covalently 
binds to CDK7 and suppresses its kinase activity with an unantic- 
ipated level of selectivity based upon modification of a unique 
cysteine residue (Kwiatkowski et al., 2014). We further identified 
a therapeutic effect of CDK7 inhibition in several types of cancer, 
including MYCN-amplified neuroblastoma, small-cell lung can- 
cer, and T cell acute lymphoblastic leukemia (Chipumuro et al., 
2014; Christensen et al., 2014; Kwiatkowski et al., 2014). Here, 
we report that TNBC demonstrates a profound dependence on 
CDK7. We further identified an “Achilles cluster” of TNBC genes 
that require CDK7 to maintain expression and that apparently 
mediate the extreme sensitivity of TNBC to CDK7 inhibition. 

RESULTS 

Exceptional Sensitivity of TNBC Cells to Covalent 
Inhibition of CDK7 

To investigate whether the proliferation of TNBC cells is sensitive 
to CDK7 inhibition, we treated triple-negative or ER/PR+ breast 
cancer cell lines with increasing concentrations of THZ1 . While 
ER/PR+ cells were largely unaffected by treatment of THZ1 at 
micromolar doses, triple-negative breast cancer cells were highly 
sensitive to CDK7 inhibition, with cell proliferation effectively sup- 
pressed by low nanomolar concentrations of THZ1 (IC 50 < 70 nM) 
(Figures 1A and IB). In contrast to the extreme sensitivity to 
THZ1 , TNBC cells were more resistant to a non-cysteine reactive 
analog of THZ1 (THZ1-R) (Kwiatkowski et al., 2014) (Figure SI A), 
suggesting that the unique characteristic of THZ1 in covalently 
binding to its target determines its antiproliferative potency. 

To understand the mechanism underlying the highly selective 
effect of THZ1 , we next proceeded to test whether CDK7 is 
equally inhibited in both triple-negative and ER/PR+ breast can- 
cer cells. CDK7 is implicated in regulating the phosphorylation of 
the carboxyl-terminal domain (CTD) of RNAPII at multiple sites 
(Ser 2, 5, and 7) either directly or via phosphorylating and acti- 
vating other CDKs (Akhtar et al., 2009; Glover-Cutter et al., 
2009; Larochelle et al., 2012; Zhou et al., 2012). We exposed 
cells to increasing doses of THZ1 or THZ1-R and found that, in 
both triple-negative and ER/PR+ breast cancer cells, CTD phos- 
phorylation at S2, S5, and S7 was effectively suppressed by 
THZ1 but not the inactive THZ1-R (Figures 1C and SIB). The 
similar effects on CTD phosphorylation by THZ1 indicates that 
CDK7 is similarly targeted in both drug-sensitive and -resistant 
cells; thus, TNBC cells appear to be far more dependent on 
the activity of CDK7 than ER/PR+ breast cancer cells. 

We further found that CDK7 inhibition efficiently induced 
apoptotic cell death in TNBC cells, indicated by the induced 
cleavage of PARP and Caspase 3 (Figure ID). In line with the 
differential response to CDK7 inhibition, cell death was not 
observed in ER/PR+ breast cancer cells treated with THZ1 (Fig- 



ure 1 D). Consistent with previous studies, THZ1 treatment also 
failed to induce cell death in non-transformed human cell lines 
(BJ fibroblasts and retinal pigment epithelial cells, RPE-1) 
(Figure ID) (Kwiatkowski et al., 2014). Notably, RNAPII CTD 
phosphorylation was suppressed by THZ1 in all of these cell lines 
(Figure 1 D) and thus did not correlate with the cell fate, again indi- 
cating an exceptional dependence on CDK7-regulated pathways 
in TNBC cells. 

In addition to regulating RNAPII CTD phosphorylation, CDK7 is 
a component of CDK-activating kinase (CAK), which is thought 
to phosphorylate and activate all CDKs, including cell-cycle 
CDKs (Schachter and Fisher, 2013). Indeed, CDK7 has been 
implicated in phosphorylating CDK1 and regulating mitosis (Lar- 
ochelle et al., 2007), and THZ1 treatment has been shown to 
induce a G2/M arrest in neuroblastoma cells with MYCN ampli- 
fication (Chipumuro et al., 2014). Surprisingly, THZ1 treatment 
did not alter the cell cycle in TNBC cells (Figure SI C). To further 
investigate whether mitosis is impaired by CDK7 inhibition, we 
utilized live-cell imaging to observe the progression of mitosis. 
We found that mitosis of TNBC cells (MDA-MB-468) progressed 
normally in the presence of THZ1 (Figure SI D and Movie SI and 
S2). The duration of time from nuclear envelope breakdown 
(NEBD) to anaphase onset was not significantly changed by 
THZ1 treatment (Figure S1E). Despite a lack of mitotic arrest 
by THZ1, cell death was efficiently induced (Figures SID and 
S1F and Movie SI). Therefore, the sensitivity of TNBC cells to 
CDK7 inhibition is likely not derived from a role of CDK7 in 
directly regulating cell-cycle-related CDKs. 

Next, we investigated whether the anti-proliferative effects 
displayed by THZ1 in established TNBC cell lines would translate 
to primary TNBC samples. To address this, we performed pri- 
mary culture of tumor cells from patient-derived xenografts 
(PDX) of TNBC and treated these cells with THZ1 . In three inde- 
pendent patient-derived TNBC cultures, THZ1 effectively re- 
duced cell viability (IC 50 < 100 nM) (Figure IE). Consistent with 
our findings obtained from established cancer cell lines, we 
observed that two ER/PR+ primary cultures were largely insensi- 
tive to THZ1 (Figures IE and IF). Furthermore, treatment of 
primary TNBC cells with THZ1 led to suppressed RNAPII 
CTD phosphorylation and induction of apoptotic cell death 
(Figure 1G). Given that the primary samples were derived from 
patients with TNBC who had progressed on multiple lines of che- 
motherapies, our data indicate that CDK7 inhibition may provide 
an effective therapeutic option for patients with this aggressive 
disease. 

An Analog of THZ1 with Improved Pharmacokinetics 

Despite the high anti-proliferative potency of THZ1 in primary 
TNBC cells, the stability of THZ1 in vivo (T1/2 of 45 min in mouse 
plasma) limits its utility for in vivo investigations. We therefore 
modified the structure of THZ1 by altering the regiochemistry 
of the acrylamide on THZ1 from 4-acrylamide-benzamide to 
3-acrylamide-benzamide, giving rise to an analog THZ2 (Fig- 
ure 2A). THZ2 had significantly improved pharmacokinetic fea- 
tures, with a 5-fold improved half-life in vivo (Figure 2B). Similar 
to THZ1 , THZ2 selectively targeted CDK7 (Figure 2C and Table 
SI) and potently inhibited the growth of triple-negative, but not 
ER/PR+, breast cancer cells (Figures 2D and 2E). THZ2 at low 
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Figure 1. CDK7 Inhibition Selectively Targets TNBC Cells 

(A) Cell growth curves of triple-negative (red) and ER/PR+ (blue) breast cancer cell lines. Cells were treated with increasing concentrations of THZ1 for 48 hr. Cells 
were then fixed and stained for the quantification of cell growth. Data are presented as means ± SD. 

(B) Bright-field images of cells that were treated with vehicle control or THZ1 (40 nM) for 7 days. Note that THZ1 induced cell death in triple-negative, but not ER/ 
PR+, breast cancer cells. 

(C) THZ1 inhibits RNAPII CTD phosphorylation in both triple-negative (MDA-MB-468) and ER/PR+ (ZR-75-1) breast cancer cells. Cells were treated with vehicle 
control (first lane) or increasing concentrations of indicated drug (2, 10, 50, 250, 1 ,250, and 6,250 nM) for 4 hr before lysates were prepared for immunoblotting. 

(D) Immunoblotting analysis of lysates harvested from cells treated for 24 hr with vehicle control or THZ1 (100 nM). Samples in the order of loading were triple 
negative (MDA-MB-468, BT549, HCC1187), ER/PR+ (ZR-75-1, T47D) breast cancer cells, and normal human cells (RPE-1, BJ1). 

(E) Indicated TNBC (red) or ER/PR+ (blue) primary cultures were treated with increasing concentrations of THZ1. Cells were subjected to CellTiter-Glow 
Luminescent Cell Viability Assay after 48 hr of treatment. Data were represented as mean ± SD. 

(F) T riple-negative (DFBC1 2-06) or ER/PR+ (DFBC1 4-15) primary culture was treated with vehicle control or THZ1 (250 nM) for 24 hr. Cells were subjected to LIVE/ 
DEAD Cell Viability Assay to indicate live (green) and dead (red) cells. 

(G) THZ1 inhibits RNAPII CTD phosphorylation and induces apoptosis in primary TNBC cells. Primary TNBC culture (DFBC12-58) was treated with vehicle control 
(first lane) or indicated concentrations of THZ1 for 24 hr before lysates were prepared for immunoblotting. 

See also Figure SI and Movies SI and S2. 



1 76 Cell 163 , 1 74-1 86, September 24, 201 5 ©201 5 Elsevier Inc. 











Cell 




IC50 

(nM) 


CDK7 


- 13.9 


CDK8 


-6830 


CDK9 


- 194 


CDK1 


-96.9 


CDK2 


-222 


CDK5 


- 134 



D 




« BT549 b CAMA1 

x HCC1187 B MDA-MB-415 

HCC38 ® T47D 

a HCC70 « ZR-75-1 

o MDA-MB-468 
SUM149 



G 




Days 



Con THZ2 THZ1-R 




F 




Concentration (nM) 



Vehicle THZ2 (10 mg/kg) 





Days 




Figure 2. An Analog of THZ1 and the Effect of CDK7 Inhibition on the Growth of Triple-Negative Breast Tumors 

(A) Structure of THZ1 and THZ2. The groups of 4-acrylamide-benzamide in THZ1 and 3-acrylamide-benzamide in THZ2 are colored red. 

(B) Stability of THZ1 and THZ2 in vivo. Mice were administered by tail vein injection a single dose of THZ1 or THZ2, and blood samples were collected at different 
time points. Concentrations of THZ1 and THZ2 in plasma samples were determined by liquid chromatography-tandem mass spectrometry (LC-MS/MS) 
approach. 

(C) In vitro IC 50 for THZ2’s potency in binding to indicated CDK. The LanthaScreen Eu Kinase Binding assay (Invitrogen) was performed with indicated CDKs and 
their associated cyclins in the presence of different concentration of THZ2. The IC 50 values indicate the affinity of THZ2 toward the ATP binding pocket of CDK. 

(D) Cell growth curve of breast cancer cells treated with increasing concentrations of THZ2 for 48 hr. Data are presented as mean ± SD. 

(E) Bright-field images of cells treated with vehicle control, THZ2 (370 nM), or THZ1-R (370 nM) for 2 days. 

(F) Cell growth curve of indicated TNBC cell lines that were treated with increasing concentrations of THZ2 for 7 days. Upon harvest, cells were fixed and stained 
with crystal violet, followed by extraction of the staining for the quantification of proliferation. Data are presented as mean ± SD. 

(legend continued on next page) 
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nanomolar doses also efficiently suppressed the clonogenic 
growth of TNBC cells (IC 50 of ~10 nM; Figure 2F and S2B). 
Like THZ1 , THZ2 induced apoptotic cell death in triple-negative, 
but not ER/PR+, breast cancer cells or normal human cells (Fig- 
ure S2A) and did not cause an obvious alteration in cell cycle 
(Figure S2C). Therefore, we have identified an analog of THZ2 
with improved pharmacokinetic properties and comparable po- 
tency that we elected to use for further investigations. 

CDK7 Inhibition Suppresses the Growth of 
Triple-Negative Breast Tumors 

We proceeded to investigate whether CDK7 inhibitors would ex- 
hibit efficacy in vivo. Treating mice intraperitoneally with THZ2 
twice daily at the dose of 10 mg/kg did not give rise to overt 
toxicity, such as a loss of body weight or behavioral changes 
(data not shown). To test whether THZ2 has any therapeutic 
effect on triple-negative breast tumors in an orthotopic xenograft 
model, we transplanted triple-negative breast tumor cells (MDA- 
MB-231) into the mammary fat pads of nude mice. When tumors 
reached ~200 mm 3 , mice were treated with vehicle or THZ2 
(10 mg/kg). Continuous treatment of THZ2 for 25 days did not 
affect body weight (Figure S2D), indicating that THZ2 is well 
tolerated in nude mice. The growth rate of tumors in mice treated 
with THZ2 was markedly reduced as compared to that of control 
tumors (Figure 2G), demonstrating an anti-tumor activity of 
THZ2. We also harvested tumors following short-term (50 hr) or 
long-term (25 days) treatment and found that both acute and 
chronic exposure to THZ2 significantly reduced CTD phosphor- 
ylation of RNAPII at all three phosphorylation sites (S2, S5, and 
S7; Figures 2H and S2E), indicating that CDK7 was efficiently tar- 
geted in the tumor cells. Compared to vehicle-treated tumors, 
tumor tissues isolated from mice treated with THZ2 had reduced 
proliferation and increased apoptosis, as indicated by immuno- 
staining against Ki67 and cleaved Caspase 3, respectively (Fig- 
ure S2F). Together, these findings indicate that the CDK7 inhib- 
itor was able to efficiently reduce tumor cell proliferation and 
induce cell death in vivo. 

We further evaluated the anti-tumor effect of CDK7 inhibition in 
two independent PDX models of triple-negative breast tumors, 
DFBC11-26 and DFBC13-11. Both PDX models were estab- 
lished from patients with metastatic TNBC, who had progressed 
on multiple lines of chemotherapy. Tumor fragments were trans- 
planted into the mammary fat pads of NOD-SCID mice. Our first 
experiment with THZ2 in NOD-SCID mice led to reduced body 
weight, suggesting that THZ2 might be less well-tolerated in 



this particular mouse strain. We therefore proceeded with using 
THZ1 in the PDX model of TNBC. When tumors grew to an 
average size of ~80 mm 3 , mice were treated with THZ1 . Although 
THZ1 has poor pharmacokinetic properties, treating mice with 
this drug led to a substantial blockage of tumor growth in both 
patient-derived tumor models (Figures 21 and 2J). Notably, 
THZ1 treatment resulted in a loss of tumor cellularity and disease 
regression (Figures 2J and 2K). Analysis of tumor tissues also 
demonstrated markedly decreased CTD phosphorylation of 
RNAPII and induced PARP cleavage, an indicator of apoptotic 
cell death (Figure 2L). These results indicate that CDK7 inhibition 
has potent anti-tumor activity in patient-derived TNBC in vivo. 

TNBC Cells Are Highly Dependent on CDK7 

To complement the pharmacological studies, which have the 
potential for unanticipated “off-target” effects, we first used 
short hairpin RNA (shRNA) to decrease expression of CDK7 in 
a variety of breast cancer cell lines. Using doxycycline-inducible 
shRNA vectors targeting multiple independent sequences of 
CDK7, we were able to reduce the abundance of CDK7 protein 
by ~20%-50% (Figure S3A). This modest reduction in CDK7 
abundance was sufficient to inhibit the growth of triple-negative, 
but not ER/PR+, breast cancer cells (Figure S3B). 

To further corroborate these results, we used the CRISPR/ 
Cas9 technique to genetically edit the CDK7 gene in five TNBC 
cell lines. Treating cells with constructs encoding two indepen- 
dent small guiding RNA targeting CDK7 (sg_CDK7) led to a sub- 
stantial reduction of CDK7 protein (Figure S3C) and suppression 
of cell growth preferentially in triple-negative, but not ER/PR+, 
breast cancer cells (Figure 3A). Notably, introducing sg_CDK7 
strongly impaired tumor formation from orthothopically trans- 
planted TNBC cells (Figure 3B). As observed with the CDK7 in- 
hibitor, sg_CDK7 also induced apoptotic cell death in TNBC cells 
(Figure 3C) and had little effect on cell-cycle distribution (Figures 
3D and S3D). Thus, both shRNA-mediated knockdown of CDK7 
and CRISPR/Cas9-mediated CDK7 gene editing produce ef- 
fects on TNBC cells that phenocopy pharmacologic inhibition 
of CDK7. These results support the view that CDK7 is the phar- 
macologically relevant target of the inhibitor and that CDK7 rep- 
resents a bona fide target for TNBC. 

CDK7 Is a Uniquely Important Transcriptional CDK for 
TNBC 

CDK7 is one of the transcriptional CDKs that regulate the initi- 
ation or elongation of RNAPII-mediated transcription (Zhou 



(G) Growth of triple-negative breast tumors (MDA-MB-231) in nude mice treated with vehicle (n = 8) orTHZ2 (n = 7; 10 mg/kg intraperitoneal). Mean ± SEM values 
are presented; *p < 0.05 (Student’s t test). 

(H) Immunoblotting of tumor lysates harvested from nude mice treated with vehicle orTHZ2 (10 mg/kg intraperitoneal) for 2 days. Tumors were isolated 3 hr after 
last treatment and subjected to the preparation of RIPA lysates. Three independent samples from each treatment were loaded in duplicates. 

(I) Growth of patient-derived triple-negative breast tumors (DFBC1 1 -26) in NOD-SCID mice treated with vehicle (n = 4) or THZ1 (n = 6; 1 0 mg/kg intraperitoneal). 
Mean ± SEM values are presented; **p < 0.01 (Student’s t test). 

(J) Growth of patient-derived triple-negative breast tumors (DFBC1 3-1 1 ) in NOD-SCI D mice treated with vehicle (n = 6) or TFIZ1 (n = 5; 1 0 mg/kg intraperitoneal). 
Mean ± SEM values are presented; *p < 0.05 and **p < 0.01 (Student’s t test). 

(K) H&E staining of tissue sections (DFBC13-11) indicating tumor regression after THZ1 treatment. Note that THZ1 -treated tumor shows a loss of cellularity 
compared to control. Images on the left and right were captured using 4x and lOx object lens, respectively. 

(L) Immunoblotting of tumor lysates (DFBC1 1 -26) harvested from mice treated with vehicle or TFIZ1 (1 0 mg/kg intraperitoneal) for 21 days. Samples (two and three 
for vehicle and THZ1 treated groups, respectively) were loaded in duplicates. 

See also Figure S2 and Table SI . 
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Figure 3. Loss of CDK7 Impairs TNBC Cell 
Growth and Tumorigenesis 

(A) Loss of CDK7 in TNBC cells impairs cell viability 
and proliferation. The left, middle, and right panels 
show the bright-field images, the crystal violet 
staining of cells, and the quantification of cell 
growth, respectively. Data in the right panel are 
presented as mean ± SD; *p < 0.01 and ***p < 
0.0001 (Student’s t test). 

(B) Tumor volume of xenografts derived from 
cells infected with sg_GFP or sg_CDK7 (sg_ 
CDK7_2 in Figure 3A). Cells were infected with 
lentivirus, selected with puromycin for 2 days, 
and then harvested for transplantation. Two 
million MDA-MB-468 or MDA-MB-231 or 4 
million SUM149 cells (viability > 94% for all 
groups, assayed by trypan blue exclusion) were 
transplanted into mammary fat pads of nude 
mice. Tumor volume was measured 4 weeks 
after transplantation for the lines of SUM149 and 
MDA-MB-231 and 5 weeks for MDA-MB-468. 
Data were represented as mean ± SEM, with p 
value indicated. The right panel shows immu- 
noblotting from cultured cells that were used for 
transplantation. Note that the protein abun- 
dance of CDK7 was efficiently decreased by 
sg_CDK7. 

(C) Immunoblotting of lysates from cells intro- 
duced with CRISPR constructs. Lentivirus-in- 
fected and puromycin-selected cells were seeded 
in 6-well plate (20, 000 cells per well) and har- 
vested in 4 days. RIPA lysates were subjected to 
the analysis of apoptotic cell death (indicated by 
PARP and Caspase 3 cleavage). 

(D) Cell-cycle analysis of cells infected with lenti- 
virus encoding sg_GFP, two independent sgRNA 
targeting CDK7. Cells were prepared as in (C) and 
then fixed for cell-cycle assay. 

See also Figure S3. 



et al., 2012). The demonstration of CDK7 as a selective target 
for TNBC led us to ask if other transcriptional CDKs might 
also serve as therapeutic targets. We used CRISPR/Cas9 to 
ablate six known CDKs that are implicated in transcriptional 



regulation, including CDK7, 8, 9, 12, 
13, and 19 (Figures 4A and S4). These 
studies demonstrated that, like CDK7, 
CDK9 was also required for clonogenic 
growth of TNBC cells (MDA-MB-231, 
MDA-MB-468, and BT549) (Figures 4B 
and 4C). CDK9 has been implicated in 
regulating transcriptional elongation, 
and physiologically, in the differentiation 
of multiple cell types (Shapiro, 2006). 
To determine whether CDK9 can also 
be targeted for TNBC in a selective 
manner, we further ablated transcrip- 
tional CDKs in ER/PR+ breast cancer 
line (ZR-75-1) (Figure S4). Notably, ER/ 
PR+ breast cancer cells were also 
sensitive to sg_CDK9 but were largely 
unaffected by sg_CDK7 (Figures 4C and 4D). Together, these 
data suggest that, among these transcriptional CDKs, CDK7 
is uniquely required for the survival and proliferation of 
TNBC cells. 
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CDK7-Dependent Transcription of an “Achilles Cluster” 
of TNBC Genes 

Given the role of CDK7 in phosphorylating the RNAPII CTD and 
CDK9 at active genes (Drapkin et al., 1996; Glover-Cutter 
et al., 2009; Akhtar et al., 2009; Larochelle et al., 2012; Kwiat- 
kowski et al., 2014), we expected that CDK7 inhibition would 
disrupt gene expression. Indeed, THZ1 treatment led to a 
dose-dependent reduction in steady-state mRNA levels for the 
tested breast cancer cell lines (Figures 5A and S5A). However, 
THZ1 treatment affected the proliferation of triple-negative, but 
not ER/PR+, breast cancer cells. Previous studies with other 
cancer cell types have shown that THZ1 treatment can cause 
selective loss of cancer-specific oncogene expression with con- 
current loss of tumor cell viability (Kwiatkowski et al., 2014; Chi- 
pumuro et al., 2014; Christensen et al., 2014). We therefore hy- 
pothesized that a critical set of TNBC genes that are 
differentially expressed between TNBC and ER/PR+ cells may 
confer the special sensitivity of TNBC cells to CDK7 inhibition. 
To test this hypothesis, we first identified genes that are overex- 



Figure 4. Unique Dependence of TNBC Cells 
on CDK7 

(A) Immunoblotting of lysates from M DA- MB -468 
cells that were infected with lentivirus encoding 
Cas9 and sgRNA targeting GFP or individual tran- 
scriptional CDK. The asterisk (*) denotes a non- 
specific signal for anti-CDK13. 

(B) Role of transcriptional CDK for the indicated 
TNBC cells. After infection and selection with 
puromycin (1.5 ng/ml, 48 hr), cells were seeded in 
12-well plate (5000 per well for MDA-MB-468, 
10,000 per well for BT549). Cells were fixed after 
1 1 days and stained with crystal violet. 

(C) Quantification of cell proliferation. Cells were 
treated as in (B). The staining was subsequently 
extracted for measurement of absorbance to 
quantify cell growth. Data are presented as mean ± 
SD; *p < 0.0001 (Student’s t test). 

(D) Bright-field images of cells infected with virus 
encoding sg_GFP, sg_CDK7 or sg_CDK9. Cells 
were assayed as in (B) and imaged with an inverted 
microscope. 

See also Figure S4. 



pressed in TNBC compared to ER/PR+ 
breast cancer lines. Genome-wide 
expression data were generated from 
two TNBC and two ER/PR+ cell lines 
over a THZ1 concentration course, and 
genes were identified that were overex- 
pressed in either TNBC line relative to 
the ER/PR+ lines. Approximately 1,000 
genes were overexpressed in TNBC lines 
relative to ER/PR+ lines; 451 of these were 
found to be especially sensitive to treat- 
ment with THZ1 (greater than 1.5-fold 
loss of expression) (Figure 5B). 

Gene ontology analysis of the TNBC- 
specific and THZ1 -sensitive genes 
showed that they were significantly en- 
riched for factors involved in signaling and transcription re- 
gulation (Figure 5C). Notably, the genes within these categories 
included a substantial number of signaling molecules and 
transcription factors with established roles in breast cancer, 
including TGFB, STAT, WNT, and EGFR/MET-mediated sig- 
naling (Bafico et al., 2004; Brand et al., 2014; Knight et al., 
2013; Lu et al., 2014; Pukrop et al., 2006; Truong et al., 2014; 
Yang et al., 201 1) (Figure 5D). Additionally, genes encoding tran- 
scription factors whose transcription is regulated by these 
signaling pathways in breast cancer, including MYC, ETS1, 
and the epithelial-to-mesenchymal transition-related transcrip- 
tion factors SOX9, TWIST1, and FOXC1, were enriched in 
this gene set (Guo et al., 2012; Lu et al., 2014; Scheel et al., 
2011; Taube et al., 2010; Watabe et al., 1998; Xu et al., 2010; 
Yang et al., 2004). The majority of these signaling components 
and transcription factors were commonly expressed in both 
TNBC cell lines and patient-derived primary cells (Tables S2 
and S3A). We thus identified genes showing TNBC-specific 
expression and sensitivity to THZ1 that encode transcriptional 
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regulators and signaling factors, which are candidate mediators 
of the response to THZ1 . This cluster of vital genes encoding 
transcriptional regulators and signaling factors in TNBC cells 
may thus collectively represent a TNBC-specific vulnerability— 
an “Achilles cluster”— for TNBC (Table S3B). 

We next sought a mechanistic explanation for the particular 
sensitivity of Achilles cluster genes to THZ1 treatment and first 
noticed that 40% of the genes in the Achilles cluster were asso- 
ciated with super-enhancers in TNBC cells. In comparison, only 
11% of all expressed genes are associated with super-en- 
hancers in TNBC cells (p = 8.18 x 10 -20 , chi-square test), and 
the majority of Achilles cluster genes (83%) are not associated 
with super-enhancers in ER/PR+ breast cancer cells (Figures 
5E, S5B, and S5C and Table S4). Previous work has shown 
that super-enhancers concentrate components of the transcrip- 
tional apparatus to drive high-level expression of their target 
genes (Hnisz et al., 2013; Loven et al., 2013; Whyte et al., 
2013). Their extraordinary reliance on transcription regulators, 
including CDK7, may confer special sensitivity to transcriptional 
inhibitors likeTHZI (Kwiatkowski et al., 2014; Loven et al., 2013; 
Chapuy et al., 2013; Chipumuro et al., 2014). Therefore, we hy- 
pothesized that the expression of these super-enhancer-associ- 
ated Achilles cluster genes might be more sensitive to THZ1 than 
other TNBC genes. Indeed, analysis by microarray expression 
confirmed that expression of genes associated with super- 
enhancers in TNBC is particularly sensitive to THZ1 treatment 
(Figure S5D). Super-enhancers also serve as a platform for re- 
gulation by multiple signaling pathways, and perturbation of 
signaling pathway components can have a profound effect 
on super-enhancer-associated genes (Hnisz et al., 2015). The 
super-enhancers associated with genes in the Achilles cluster 
show a significant enrichment in DNA-binding motifs for ter- 
minal effector transcription factors of signaling pathways 
(Figure 5F). Taken together, these results suggest that the su- 
per-enhancer-driven Achilles cluster genes may be sensitive to 



THZ1 as a result of their dependency on CDK7 and their inter- 
connected regulation by signaling pathways whose components 
are encoded by genes that are themselves sensitive to THZ1 . 

We next asked whether THZ1 would induce similar gene 
expression changes in primary TNBC cells. Gene set enrichment 
analysis indicated that the genes most sensitive to THZ1 in pri- 
mary TNBC cells are enriched for the TNBC-specific Achilles 
cluster genes (Figure 5G). Indeed, quantitative PCR (qPCR) 
confirmed that, in primary TNBC cultures, the expression of 
selected TNBC cluster genes was particularly vulnerable to 
CDK7 inhibition (Figure S5E). 

TNBC Cells Are Addicted to the Expression of Achilles 
Cluster Genes 

Next, we sought to confirm that components of the Achilles clus- 
ter are essential for TNBC cells and are thus likely to contribute to 
the cellular response of TNBC cells to CDK7 inhibition. To reflect 
the composition of the Achilles cluster, we chose eight candidate 
Achilles cluster genes that encode for super-enhancer-associ- 
ated transcriptional regulators and signaling factors for this anal- 
ysis. We used CRISPR/Cas9-mediated gene editing to knock out 
these candidate genes and assessed how the functional loss of 
these genes impacts TNBC proliferation and viability. We found 
that triple-negative (M DA- MB -468, BT549) breast cancer cells 
were more dependent for proliferation on EGFR, FOSL1, 
FOXC1 , MYC, and SOX9 than ER/PR+ (ZR-75-1) breast cancer 
cells, while proliferation of the TNBC and ER/PR+ cell lines 
was similarly sensitive to loss of EN1, ETS1, and TWIST1 (Fig- 
ures 6A and S6A). Using additional CRISPR vectors that target 
independent sequences of EGFR or SOX9, we confirmed that 
gene editing of these two genes suppressed cell growth and 
induced apoptotic cell death (Figures 6B, 6C, S6B, and S6C). 

EGFR has been pursued as a therapeutic target for TNBC 
(Corkery et al., 2009; Ueno and Zhang, 2011). However, kinase 
inhibitors of EGFR have not produced satisfactory results in 



Figure 5. Genes Expressed Differentially in TNBC versus ER/PR+ Breast Cancer Cells and Sensitive to CDK7 Inhibition Indicate Critical 
Cellular Functions for TNBC Survival 

(A) THZ1 treatment globally affects steady-state mRNA levels. BT549 and MDA-MB-468 TNBC cells were treated with THZ1 at the indicated concentrations for 
6 hr. Heatmaps display the Log2 fold change in gene expression versus vehicle control for the set of expressed transcripts. 

(B) Genes differentially expressed between TNBC and ER/PR+ breast cancer lines. Individual bars represent the difference in expression in TNBC cells versus ER/ 
PR+ cells for a gene. Genes that were differentially expressed in either of two TNBC cell lines (BT549and MDA-MB-468) relative to two ER/PR+ lines (ZR-75-1 and 
T 47D) were identified as TNBC specific (right side of y axis). Genes that were differentially expressed in either of two ER/PR+ lines relative to two TN BC breast cancer 
lines were identified as ER/PR+ specific (left side of y axis). Genes whose expression decreased by 1 .5-fold or greater upon treatment with THZ1 were colored (blue 
for TNBC specific; green for ER/PR+ specific). Log2 fold change between TNBC and ER/PR+ expression is shown along the x axis at the bottom of the image. 

(C) Enriched Gene Ontology functional categories of TNBC-specific genes sensitive to THZ1 treatment. The top enriched molecular function GO categories are 
shown. Individual bars represent the Bonferroni-corrected p value for enrichment of specific gene ontology categories. Values for TNBC-specific, THZ1 -sensitive 
genes are shown in blue. Values for ER/PR+-specific, THZ1 -sensitive genes are shown in green. 

(D) Depiction of signaling pathways and transcription factors that comprise Achilles cluster genes. Highlighted genes are found in the Achilles cluster. 

(E) Achilles cluster genes are enriched in super-enhancers-associated genes. Venn diagram showing the overlap (66) between the genes that comprise 
the Achilles cluster (1 66) and genes that have TNBC super-enhancers (SE) in either MDA-MB-468 or BT549 (1 207) (top). Total H3K27Ac ChIP-seq signal (length 
* density) in enhancer regions for all stitched enhancers in MDA-MB-468 TNBC cell line. Enhancers are ranked by increasing H3K27Ac ChIP-seq signal (bottom). 
Highlighted super-enhancers are associated with selected members of the Achilles cluster. Shown are top super-enhancers for each SE-associated gene. 

(F) Enrichment of DNA-binding motifs targeted by signaling transcription factors in constituent enhancers of super-enhancers regulating Achilles cluster genes in 
TNBC cells. The motif bound by the CTCF transcription factor is not enriched in the super-enhancers associated with Achilles cluster genes and is used as a 
negative control. 

(G) Genes most strongly downregulated by THZ1 treatment in patient-derived TNBC primary cells are enriched for Achilles cluster genes. Gene set enrichment 
analysis of Achilles cluster genes in comparison to genes downregulated in TNBC primary cultures (DFBC12-06, DFBC12-58, DFBC13-1 1) following treatment 
with THZ1 (250 nM) for 6 hr. GSEA-supplied p values are given. 

See also Figure S5 and Tables S2, S3, and S4. 
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Figure 6. Functions of Achilles Cluster 
Genes in TNBC Cells 

(A) CRISPR/Cas9-mediated editing of selected 
TNBC Achilles cluster genes in TNBC cell line. 
MDA-MB-468 cells were infected with lentivirus 
encoding indicated sgRNA, selected with puro- 
mycin. (Left) Immunoblotting for the expression of 
indicated genes. (Right, top) Cells were seeded in 
12-well plates (5,000 cells per well), harvested in 
1 0 days, and stained with crystal violet; the staining 
was extracted for the quantification of cell growth 
(right bottom). Data were represented as mean ± 
SD; *p< 0.001. 

(B) Additional CRISPR vectors decrease the 
protein abundance of EGFR. Vectors encoding 
sg_EGFR_2 and sg_EGFR_3 were tested along 
with sg_EGFR_1 (sg_EGFR in Figure 6A). Protein 
lysates were harvested for immunoblotting. 
Cleaved PARP was used as a marker for apoptotic 
cell death and Vinculin as a loading control. 

(C) CRISPR/Cas9-mediating gene editing of EGFR 
impairs cell proliferation. MDA-MB-468 cells were 
treated as in (A) for measurement of cell prolifera- 
tion, *p < 0.001. 

(D) Proliferation of TNBC cells (top, MDA-MB-468; 
bottom, DFBC12-58) treated with increasing 
concentrations of EGFR inhibitors or TFIZ1 . Cells 
were harvested in 3 days for measurement of cell 
proliferation. 

(E) TNBC cells (top, MDA-MB-468; bottom, 
DFBC12-58) were treated with vehicle control or 
indicated EGFR inhibitors for 30 min. Cell lysates 
were harvested for immunoblotting. 

See also Figure S6. 



TNBC clinical trials (Carey et al., 2012). We used three indepen- 
dent kinase inhibitors that are known to target EGFR (eriotinib, 
gefitinib, and lapatinib) and found that EGFR kinase inhibition 
largely spared TNBC cells (Figures 6D and S6D), despite evi- 
dent suppression of EGFR autophosphorylation and down- 
stream MAPK phosphorylation by these inhibitors (Figures 6E 
and S6E). These data indicate the existence of kinase-indepen- 
dent functions of EGFR that are essential for TNBC cell growth 
and survival (Weihua et al., 2008) and further suggest that target- 
ing the transcription of EGFR, as achieved by CDK7 inhibition, 
provides a unique advantage that cannot be achieved by inhibi- 
tors of EGFR kinase activity. Together, these data show that 
targeting CDK7-dependent transcription represents an effective 
means to collectively suppress the expression of multiple onco- 
genes that are critical for the proliferation and viability of TNBC 
cells. 

DISCUSSION 

Triple-negative breast cancer is a highly aggressive subtype of 
breast cancer that lacks effective therapeutics, due in part to the 
genetic complexity that has limited the development of “targeted” 
therapies. Despite its heterogeneous nature, TNBC cells share a 
similar transcriptional program, suggesting that tumors of this 
subtype may be highly dependent on expression of at least a sub- 
set of the active genes in these cells. We found that TNBC cells are 
exceptionally dependent on the transcriptional cyclin-dependent 



kinase CDK7 and that a cluster of TNBC-specific genes is espe- 
cially sensitive to CDK7 inhibition. Our results thus indicate that 
CDK7 mediates transcriptional addiction to this vital cluster of 
genes in TNBC and that CDK7 inhibition represents a highly prom- 
ising therapy for this subtype of breast cancer. 

CDK7 inhibition revealed an “Achilles cluster” of genes in the 
TNBC transcriptional program that are likely to be responsible, at 
least in part, for rendering these cells selectively sensitive to 
THZ1 treatment. These genes were identified by their overex- 
pression in TNBC cells relative to ER/PR+ cells, their sensitivity 
to THZ1 , and their involvement in transcriptional regulation and 
in signaling. This group contains putative oncogenes that are 
misregulated in the triple-negative disease state and essential 
for TNBC tumorigenicity. For example, loss of transcription fac- 
tors FOSL1 or SOX9 dramatically impairs the tumorigenic po- 
tency of TNBC cells (Tam et al., 2013; Wang et al., 2013) (Figures 
6 and S6). Similarly, perturbation of MET- and EGFR-mediated 
signaling reduces TNBC cell proliferation (Brand et al., 2014; 
Flsu et al., 2014; Sohn et al., 2014; Figures 6 and S6). 

A striking number of genes in the TNBC “Achilles cluster” were 
associated with super-enhancers in triple-negative, but not in 
ER/PR+, breast cancer cells, suggesting a mechanism that 
may contribute to their sensitivity to CDK7 inhibition (Chipumuro 
et al., 2014; Christensen et al., 2014; Kwiatkowski et al., 2014). 
Transcription of many of these genes is known to be regulated 
by signaling pathways whose members are represented in the 
cluster, and super-enhancers are thought to serve as a platform 
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for regulation of transcription by signaling pathways. Thus, su- 
per-enhancers associated with genes in the Achilles cluster 
may confer sensitivity to THZ1 by two means: (1) dependency 
on high levels of transcription apparatus that includes the imme- 
diate target of the drug and (2) dependency on a functionally in- 
terconnected network of transcription factors and signaling 
components. Other genes in the cluster do not appear to be 
associated with super-enhancers, however, so there are likely 
to be other reasons for their sensitivity to CDK7 inhibition. For 
example, these genes may depend on continuous expression 
of some of the super-enhancer-driven transcription factors and 
may thus be affected secondarily. It is also possible that tran- 
scriptional control of this group is more dependent on CDK7 
function than others (Kanin et al., 2007), perhaps because they 
cannot utilize alternative pathways of RNA Pol II CTD phosphor- 
ylation, such as that enabled by Erkl/2 (Tee et al., 2014). 

The “Achilles cluster” of TNBC genes described here repre- 
sents a collection of genes encoding transcriptional regulators 
and signaling components that are overexpressed in multiple 
TNBC cells and sensitive to CDK7 inhibition, which differs from 
the well-established approach of seeking a signature for a can- 
cer subtype. Signatures typically include genes that are com- 
monly expressed in cancer subtypes; such signatures are 
especially valuable when it is likely that a common gene or set 
of genes must be responsible for a phenotype. In contrast, 
when the subtype is genetically heterogeneous, we suggest 
that it is valuable to compile a larger collection of genes (the 
union of genes in multiple samples, rather than the intersection) 
that are affected by transcriptional inhibition in multiple cell lines 
or patient samples concurrent with a cellular phenotype because 
disregulation of different subsets of the genes in the cluster may 
produce the same phenotype in a cancer subtype with a com- 
plex genotype. The benefit of this approach is the potential to 
explain how tumor cells that are genetically heterogeneous 
may be dependent on diverse, yet overlapping, sets of genes. 

The strategy of targeting transcription of a cluster of cancer- 
specific genes, as described here for TNBC, may be applicable 
to other difficult-to-treat cancers. Recent large-scale efforts 
have found that TNBC gene expression patterns are highly 
correlated with aggressive ovarian cancer and lung squamous 
carcinomas (Cancer Genome Atlas Network, 2012; Hoadley 
et al., 2014). As with TNBC, these cancers have a high mutation 
rate, an extremely high prevalence of p53 mutations, and lack a 
commonly altered genetic event that can be targeted for thera- 
peutic intervention (Cancer Genome Atlas Research Network, 
201 1 , 2012). Thus, it is possible that various aggressive tumors 
develop transcriptional addictions to clusters of genes that are 
misregulated and dependent on CDK7, and if so, CDK7 inhibition 
might be useful therapy for such cancers. 

In summary, we have discovered a CDK7-dependent tran- 
scriptional addiction in triple-negative breast cancer and identi- 
fied CDK7 inhibition as a highly selective and potent means to 
disrupt expression of a key cluster of genes. Our study demon- 
strates that inhibition of transcription is an effective strategy to 
target highly aggressive breast cancers with high genetic hetero- 
geneity and lacking obvious “driver” oncogenes. Further studies 
will be required to determine whether these observations will 
translate to clinical treatment of human breast cancer. 



EXPERIMENTAL PROCEDURES 
Cell Culture 

Human breast cancer cell lines were grown in RPMI-1640, 10% fetal bovine 
serum, and 1% penicillin/streptomycin. For gene knockdown assays, cells 
were infected with lentivirus encoding sgRNA or tetracycline-inducible shRNA. 
Details of cell culture, construction of plasmids, and viral infection are 
described in the Supplemental Experimental Procedures. 

Animal Studies 

All animal experiments were conducted in accordance with the animal use 
guidelines from the NIH and with protocols approved by the Dana-Farber Can- 
cer Institute Animal Care and Use Committee. Full details are described in the 

Supplemental Experimental Procedures. 

ChIP-Seq and Data Analysis 

ChIP was performed as previously described (Lee et al., 2006), using anti- 
H3K27ac (Abeam, AB4729A). Details of ChIP-seq and data analysis are 
described in the Supplemental Experimental Procedures. ChIP-Seq and 
gene expression microarray data are deposited in GEO: GSE69107. 

Data Analysis of Gene Expression 

To calculate differential expression, Log2 signal intensities were used. For 
each transcript, the maximum Log2 signal intensity from either of the two 
ER/PR+ breast cancer cell lines was subtracted from the maximum Log2 
signal intensity for that transcript in either of the two TNBC cell lines. Tran- 
scripts with a difference of +2 or greater were classified as more expressed 
in TNBC cells. Transcripts with a difference of -2 or less were classified as 
more expressed in ER/PR+ breast cancer cells. For sensitivity to THZ1 treat- 
ment, transcripts were considered sensitive if the expression level declined 
greater than 1 .5-fold upon treatment with 250 nM THZ1 . Any gene with one 
or more transcripts that passed the two criteria described above was consid- 
ered for further analysis. For gene ontology analysis, the DAVID suite of online 
tools (http://david.abcc.ncifcrf.gov/tools.jsp) was used to interrogate the mo- 
lecular function ontology defined by the Gene Ontology Consortium. 
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ChIP-seq and gene expression microarray data are deposited in GEO: 
GSE69107. 
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SUMMARY 

Protein kinases control cellular responses to envi- 
ronmental cues by swift and accurate signal pro- 
cessing. Breakdowns in this high-fidelity capability 
are a driving force in cancer and other diseases. 
Thus, our limited understanding of which amino 
acids in the kinase domain encode substrate 
specificity, the so-called determinants of specificity 
(DoS), constitutes a major obstacle in cancer 
signaling. Here, we systematically discover several 
DoS and experimentally validate three of them, 
named the aCI, aC3, and APE-7 residues. We 
demonstrate that DoS form sparse networks of 
non-conserved residues spanning distant regions. 
Our results reveal a likely role for inter-residue allo- 
stery in specificity and an evolutionary decoupling 
of kinase activity and specificity, which appear 
loaded on independent groups of residues. Finally, 
we uncover similar properties driving SH2 domain 
specificity and demonstrate how the identification 
of DoS can be utilized to elucidate a greater under- 
standing of the role of signaling networks in cancer 
(Creixell et al., 2015 [this issue of Ce//]). 

INTRODUCTION 

Cellular organization and response to external and internal cues 
relies on swift and precise processing of information through cell 
signaling networks. High fidelity in these circuits depends 
critically on the recognition and phosphorylation of specific sub- 
strates by protein kinases, and perturbations of this cellular sys- 
tem have been linked to significant evolutionary transitions 
(Capra et al., 2012; Skerker et al., 2008; Tan et al., 2009; Zarrin- 
par et al., 2003), as well as to disease progression, in particular, 
in cancer (Borrello et al., 1995; Creixell et al., 2012; Marengere 
et al., 1994; Santoro et al., 1995; Songyang et al., 1995). 

Cellular signaling fidelity is maintained essentially through two 
coupled mechanisms. At a macro-molecular level, protein spec- 

CrossMark 



ificity ensures that each protein kinase will reach and interact 
with its protein substrates. At a micro-molecular or atomic level, 
peptide specificity defines the ability of a given kinase domain 
present in all active protein kinases to recognize and phosphor- 
ylate a specific peptide within the protein substrate (Turk, 2008) 
(Figure 1A). A variety of experimental techniques have been 
developed to elucidate the peptide specificity for many modular 
signaling domains and obtain specificity profiles (e.g., the so- 
called Position-Specific Scoring Matrices, PSSMs), as a quanti- 
tative measure of the preference of each kinase domain for each 
amino acid residue at every peptide substrate position (Fig- 
ure SI). While other factors contributing to protein interaction 
specificity at a macro-molecular level (such as co-localization, 
co-expression, docking motifs, and scaffold or adaptor proteins) 
have been described (Bhattacharyya et al., 2006; Linding et al., 
2007; Remenyi et al., 2005; Scott and Pawson, 2009), the com- 
bination of residues in the kinase domain that encode peptide 
substrate specificity, the so-called determinants of specificity 
(DoS), have remained largely elusive (Figure IB). Even though 
some structural studies have helped identify residues that are 
in close contact with the substrate peptide which likely influence 
specificity (Brinkworth et al., 2003; Ellis and Kobe, 2011; Hanks 
and Hunter, 1995; Mok et al., 2010; Nolen et al., 2004), these 
studies were largely focused on specific kinase families and/or 
non-human species as well as limited in scope by the small 
number of kinase-peptide structures currently available and an 
inability to capture potentially long-range DoS. 

Here, we present a computational approach that aims to 
overcome these limitations and address the following open 
questions. Which residues within the kinase domain contribute 
to peptide specificity (constituting the so-called DoS)? Are 
these determinants just a small group of residues localized in 
close proximity to the substrate as currently thought, or do 
they form a sparse network of residues instead (Figure 1C)? 
Are such principles of domain-peptide specificity conserved in 
other domains? Finally, how do these DoS relate, spatially and 
functionally, to those residues known to be involved in the regu- 
lation and catalytic activity of the kinase domain? In other 
words, are these different functionalities loaded onto the same 
residues or on independent groups of residues, and how did 
they evolve? 
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Figure 1. Open Questions in Protein Domain-Peptide Specificity 

(A) Protein specificity determines the interaction between the whole kinase protein and its substrates and is driven by processes such as interactions between 
other domains and motifs (e.g., SH2 and phospho-tyrosine in this figure), co-expression of the two proteins, cellular localization, scaffold proteins, etc. 

(legend continued on next page) 
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As we demonstrate in our accompanying article (Creixell et al. , 
2015 [this issue of Cell]), which explores how cancer mutations 
affect domain specificity by integrating the DoS identified here, 
resolving these questions could represent a valuable contribu- 
tion not only for basic signaling biology but also for cancer 
research. 

RESULTS 

Learning about Residue Contributions to Specificity by 
Sampling over Different Specificity Masks 

When investigating the relationship between kinases at the 
domain primary sequence similarity level and at the substrate 
sequence motif similarity level (using specificity profiles or 
PSSMs derived from Positional Scanning Peptide Library or 
PSPL experiments, see Experimental Procedures and Fig- 
ure SI), it is apparent that, when considering the domain in 
its entirety, no strong linear correlation between these exists 
(Figure SI). We hypothesized that this lack of correlation could 
indicate that substrate specificity is not encoded by the domain 
as a whole. Instead, we hypothesized that a limited number 
of residues contribute to specificity, and that those that do 
contribute, are likely to do so to different degrees. In order 
to capture this principle, we introduced the specificity mask 
as a fundamental entity in our approach. As depicted in Figures 
IB and 2 (small box), a specificity mask is defined as a 
particular combination of contributions to specificity from 
the different residues in the kinase domain. For example, an 
extreme hypothesis where all residues within the kinase domain 
contribute equally to specificity would be represented by all 
entries in a mask with the same score (e.g., 0.5). Instead, a 
situation where a single residue, X, would drive specificity 
would be represented by all entries scoring 0.0 except position 
X scoring 1 .0. 

Our approach (described below) explores the possibility that 
within a large ensemble of specificity masks, certain masks 
can discriminate between kinases with dissimilar substrate 
specificities better than others. These masks will range from 
those capturing very few and localized DoS (reminiscent of 
models explored in the structural studies; Brinkworth et al., 
2003; Ellis and Kobe, 2011; Flanks and Flunter, 1995; Mok 
et al., 201 0; Nolen et al., 2004) to those capturing a larger number 
of determinants distributed more sparsely across the kinase 
domain (Figure 1C). As further detailed in the next section, since 
our aim was to identify new DoS following an unbiased data- 
driven systematic approach, we did not impose any restrictions 
in the set of specificity masks that can be found; instead, we 
explore a large set of possible specificity masks and let the sys- 
tem evolve and find those showing the best discriminatory 
capabilities. 



The KINspect Methodology 

In order to identify which residues contribute to specificity, we 
developed a computational framework named KINspect, which 
explores a very large number of combinations of residues, and 
their contribution toward specificity, and subsequently identifies 
those featuring the best predictive capability (Figure 2). This type 
of approach, known in machine learning as learning classifier sys- 
tems (Lanzi et al., 2000), enables the selection of the best-per- 
forming set of specificity masks starting from a large initial set 
of random masks by following three consecutive steps (Figure 2). 

First, for each specificity mask, the specificity profiles (PSSMs) 
for each kinase are predicted by comparing all kinases across the 
human kinome at each amino acid position within the kinase 
domain (amino acid similarity) and by incorporating a weighting 
factor (from 0 to 1 ; 0 being not important, 1 being critical) of the 
“specificity importance” of each position as determined by the 
given specificity mask. A PSSM for each kinase is then predicted 
by integrating the PSSMs for the other kinases using the mask- 
dependent similarity as a weighting factor. Naturally, the majority 
of masks within the original set of random masks will predict 
specificity poorly, but, as the system evolves, the masks will 
improve their predictive power, i.e., become more fit. 

Second, masks are ranked according to their predictive perfor- 
mance (i.e., their ability to predict PSSMs that are similar to the 
experimentally determined PSSMs). In essence, masks that 
more closely capture the true contribution of each position within 
the kinase domain (i.e., those scoring higher at kinase domain 
positions that truly contribute to specificity) will result in a better 
prediction of the specificity profiles, thus ranking higher. 

Third, the worst-performing masks are filtered out and new 
masks, representing both subtle (mutation) but also more abrupt 
(cross-over) variations of the best-performing masks, will be 
added. 

These three steps are initially started with random specificity 
masks and repeated until convergence is reached and fitness 
cannot be optimized further. Residues consistently scoring 
higher in the specificity masks following the optimization 
procedure will be considered candidate DoS. For a more tech- 
nical description of the algorithm, please refer to Figure 2 and 
Extended Experimental Procedures. 

Model Robustness, Validation, and Coverage 

Since our method contains stochastic aspects (such as the 
starting set of random masks and the generation of new masks 
by mutation and cross-over), one initial question that must be 
addressed is whether the method is robust to this initial stochas- 
ticity, i.e., whether one would obtain similar results if the process 
was started with arbitrary initial conditions and evaluated inde- 
pendently several times. To this end, we compared the fitness 
evolution of ten independent KINspect evaluations and found 



(Bhattacharyya et al., 2006; Linding et al., 2007; Remenyi et al., 2005; Scott and Pawson, 2009). Peptide specificity, in contrast, is solely driven by the sequence 
and structure of the kinase domain and drives the phosphorylation of specific linear motifs within the substrate protein. 

(B) The so-called determinants of specificity (DoS) are those residues within a protein domain that together drive and determine the peptide specificity of the 
domain. 

(C) While relatively few localized DoS have been described in the kinase domain, this study explores the existence of more determinants and their relative domain 
positions. 
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Figure 2. Overview of the KINspect Algorithm 

The KINspect workflow is designed to identify the specificity mask that best describes the importance of the different residues for specificity. Different com- 
binations of contributions to specificity by different kinase domain residues are collected as specificity masks (top left), where a score between 0 and 1 is given to 
each position within the kinase domain. Originally, the specificity masks are initialized with random values to then follow a machine-learning procedure that will 
ensure the masks with the highest predictive power toward specificity are selected for and optimized. This procedure, known as a learning classifier system, is 
divided into three separate steps. 

In step 1 , for each specificity mask the system loops over all query kinases and, using a kinase domain alignment, compares the query kinase to all other kinases 
(except those belonging to the same kinase family, which are excluded only at this stage to avoid over-fitting) at the sequence level, generating a similarity vector. 
This vector is combined with the specificity mask, so that similarity in high-scoring positions of the mask is reinforced and similarity in low-scoring position of the 
mask is silenced, effectively producing a mask-weighted similarity vector and sum score for each kinase. These values are subsequently used to integrate the 
different observed PSSMs into a combined predicted PSSM for the query kinase (as further explained by the equations and text in Supplemental Experimental 
Procedures section and in Zhang et al., 2009). 

In step 2, after a predicted kinase has been generated for all the kinases in our set, fitness is computed as the median of all the differences between the predicted 
and the experimentally determined PSSM for all the kinases obtained from the NetPhorest repository (Miller et al., 2008). 

In step 3, the best-performing specificity masks are kept (“elite”), and new ones are generated by mutation (changing the value of a given position in the mask) and 
cross-over of the elite sequences (combining two segments of two other masks), as typically done in genetic algorithms. Once a new set of masks has been 
generated, the whole procedure (prediction, fitness evaluation, and generation of new masks) is repeated iteratively until fitness (defined as median error between 
predicted and observed specificity profiles) cannot be improved any further (i.e., convergence is reached). 

Residues scoring high in the optimized specificity masks will be considered candidate DoS. For further details on this procedure, please refer to Supplemental 
Experimental Procedures. 



highly comparable fitness trajectories, as well as increasing sim- 
ilarity between the best-performing masks at each generation 
(Figure S2; Data SI , S2, and S3). Moreover, we confirmed that 
the results are not simply due to trivial technical factors, such 
as residue conservation or alignment gaps (Figure S3), and 
that similar results could not be obtained using uniform or 
randomized sets (Figure S3). Taken together, these results 



demonstrate that KINspect is robust to arbitrary initial conditions 
and converges to a limited set of highly similar solutions (speci- 
ficity masks, Figure S3). 

Moreover, we also explored a vast number of possible combi- 
nations of residues and specificity models. Since convergence in 
the model requires approximately 2,500 cycles of the above 
three steps (in the case of the human kinase domain) and 100 
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specificity masks are used at every generation, 250,000 models 
were explored in the kinome-wide search for the most informa- 
tive masks. By repeating this algorithmic deployment indepen- 
dently ten times with arbitrary initial conditions, 2,500,000 
models were explored in total. The high number of models 
explored and the fact that the independent evaluations converge 
on their solutions imparted confidence that the results obtained 
could be close to the “true mask” of specificity. 

In order to further benchmark our approach, we collected an 
inclusive “golden list” of residues that had been suggested or 
predicted as DoS (Table SI) in the literature covering a variety 
of methods and species (Brinkworth et al., 2003; Hanks and 
Hunter, 1995; Johnson et al., 1998; Mok et al., 2010; Nolen 
et al., 2004) and explored the possibility that the best masks 
would be enriched in this set of “golden” determinants. Indeed, 
Figure S3 shows that, while the distributions over specificity 
scores of previously reported DoS and other residues are 
probabilistically equivalent at the start of the optimization pro- 
cess, they are remarkably different at the end of it, supporting 
the aforementioned enrichment (Fisher’s exact test one sided, 
p = 8.4 x 10“ 7 ). 

In addition to identifying candidate DoS, our approach can 
predict the domain specificity (PSSM) of every kinase in the hu- 
man kinome from sequence alone. Therefore, we could compare 
these to those kinases where the specificity profile has previ- 
ously been experimentally determined (Miller et al., 2008) and 
assess the algorithm’s predictive accuracy (Figure S2). As 
shown in Figure S2, KINspect presents better sequence-speci- 
ficity predictive capabilities for some families (e.g., CK1 group) 
than others (e.g., STE group), likely reflecting both biological dif- 
ferences and algorithmic preferences (for instance, particular 
family differences in specificity that could not be captured by 
our kinome-wide specificity masks). Finally, for a small set of 
kinases used as a “gold standard” in the DREAM challenge (Ellis 
and Kobe, 2011) and that, importantly, were not part of our 
training set, we could confirm that overall KINspect performed 
better than other methods (Figure S3). 

While the results in Figure S3E confirm enrichment in previ- 
ously reported DoS, it is also important to note that KINspect 
identified a large number of additional DoS that had not been 
reported in the literature (e.g., 82 alignment positions above 
the arbitrary threshold of having a KINspect score above 0.8). 
Thus, we set out to evaluate the likelihood that these newly iden- 
tified residues would be true DoS. Following up on our initial 
reasoning, we hypothesized that by identifying true DoS (the ki- 
nase domain residues that truly encode for the domain’s speci- 
ficity) one should be able to observe better correlations between 
kinase sequence and kinase specificity, by limiting the compar- 
ison to this specific set of residues. Indeed, Figure 3A illustrates 
how limiting the comparison to those residues that obtained 
higher KINspect scores not only maintains, but, in fact, improves 
the sequence-to-specificity correlation by approximately 20% 
(as compared to the Spearman correlation obtained by consid- 
ering the entire domain). Furthermore, we could confirm that 
other similarly small groups of residues, such as the set of previ- 
ously reported DoS, or other selection strategies, such as resi- 
dues close to the substrate, do not lead to similar improvements 
of the sequence-to-specificity correlation (Figure 3A; Figure S4). 



We next selected a group of residues predicted by KINspect 
to be DoS and devised PSPL experiments to experimentally vali- 
date their involvement in specificity. In particular, as shown in 
Figure 3B, for our first experiment we selected two of the candi- 
date DoS predicted by KINspect (named aCI and aC3 as they 
are located on the first and third residues of the aC helix of the 
kinase domain) with scores of 1.0 and 0.95 that are in close 
proximity to residue P+2 in the peptide substrate. Next, since 
PKCy has a strong preference for Arg and Lys at P+2 that had 
so far defied structural analysis, we mutated the aCI and aC3 
residues on PKCy from the wild-type aspartates to alanines. 
As shown in Figure 3C (and Figure S4), the mutant form main- 
tained the Arg preference but lost its Lys preference at this 
particular position, at the same time gaining preference for 
aromatic residues, thereby validating the specificity determining 
nature of these DoS predicted by KINspect. 

For our second experiment, we selected a position (named 
APE-7 as it is located seven residues before the APE motif 
delimiting the activation segment) with a score of 0.75 in close 
proximity to residue P+1 (Figure 3B). Similar to the case of 
PKCy in the aCI and aC3 residues, Piml features an unexplained 
strong preference for Gly on position P+1 , which is unusual for a 
kinase belonging to the CAMK family. Thus, we mutated Piml 
from its wild-type Asp to Cys, a residue more typically seen in 
other CAMK kinases, hypothesizing that if this single substitution 
could abrogate this Gly preference on position P+1, it would 
prove the specificity driving nature of the APE-7 residue. As 
shown in Figure 3C (and Figure S4), indeed this single-point 
mutation on Piml leads to a shift away from P+1 Gly preference 
to a non-specific profile similar to that of other CAMKs. 

Taken together, these results demonstrate that KINspect 
successfully identified a set of residues on which the specificity 
of the entire domain is encoded. 

The Determinants Form Sparse Networks of Residues 
that Together Encode Specificity 

In order to evaluate the relationship between the different DoS, as 
well as between the DoS and the peptide substrate, we investi- 
gated their spatial distribution in the kinase domain. Figure 4 
and Movie SI show the tertiary structure of the DoS identified 
by KINspect (alignment positions above the arbitrary threshold 
of having a KINspect score above 0.9 across ten independent de- 
ployments of KINspect) and offers two interesting observations: 

First, we note that several of the determinants localize rela- 
tively far from the peptide substrate. However, most of these 
distant DoS seem coupled to other DoS through “canals” (i.e., 
existing structural paths connecting the different DoS among 
each other and ultimately with the substrate) that eventually con- 
tact the substrate peptide, as shown, for instance, in Figures 4B, 
4C, or 4J. Such distribution of residues in networks spanning 
different domain sites and the presence of these “canals” sug- 
gest that specificity could possibly be encoded by groups of 
residues that communicate from different parts of the domain, 
perhaps in a similar manner to which other domains are regu- 
lated allosterically through protein sectors (Reynolds et al., 
2011 ). 

Second, closer inspection of the results (Figure 4; Movie SI) 
suggests the presence of three clusters of DoS that, while 
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connected by other residues that (to a lesser extent) are also 
likely to contribute to specificity, are located on different 
patches of the kinase domain: cluster 1 , while mainly containing 
residues from the bigger C-lobe (the lobe best described in 
terms of its importance for kinase specificity), also spans resi- 
dues from the N-lobe and contacts directly with, and to a large 
degree encapsulates, the substrate peptide. This could be 
considered the main cluster directly driving specificity and in- 
cludes several of the residues and structural features previously 
linked to specificity (e.g., the activation segment or the P+1 
loop; Nolen et al., 2004), as well as new ones, such as the res- 
idues in the aC helix that we experimentally validated to encode 
specificity. Cluster 2, on the other hand, is comparably smaller 
and contains exclusively residues belonging to the big C-lobe 
of the domain. Given its position, we suggest that this cluster 
of residues could affect specificity by closing (or opening) the 
domain inward (or outward), effectively modifying the size and 
shape of the binding pocket, especially on the region that con- 
tacts the N-terminal section of the substrate peptide. Finally, 
cluster 3, containing very few residues of the small N-lobe, 
seems to contribute to specificity by causing subtle structural 
re-arrangements leading to differences in the opening and clos- 
ing of the lobe onto the peptide. Overall, while all three clusters 
simultaneously encode specificity on different parts of the sub- 
strate peptide, by shaping the active site in a cumulative and 
non-linear fashion, cluster 1 appears to be the main driver of 
specificity (Figure S4). 

Domain and Specificity Evolution 

We next set out to explore whether evolutionary insights could 
be derived from these results. It has previously been observed 
that the evolution of the kinase domain as a whole is not an 
accurate reflection of how different kinases have evolved 
different peptide specificities (Miller et al., 2008; Rausell et al., 
2010). Thus, we speculated that a Dendrogram based solely 
on residues identified as DoS by KINspect could carry significant 
differences compared to a domain-wide phylogenetic tree. 
Indeed, Figure 5A (and Figure S5) illustrates how the relation- 
ships between kinases (and even between kinase families) 
appear to deviate when addressed from the DoS’ perspective. 
This DoS-based tree (Figures 5A and S5) illustrates interesting 
differences including: (1) the embedding of kinase families within 
other families, such as in the case of the PKN family, embedded 
within the PKC family, (2) clustering of seemingly unrelated 
families, such as the Yank and GRK families, or (3) the splitting 



of families in two sets displaying marked amino acid differences 
on their DoS, such as in the case of the Ste20 family. 

Thus, this analysis provides further proof and explanation as to 
how and why the evolution of the entire domain does not always 
parallel specificity evolution (Capra et al., 2012). Using the DoS- 
based Dendrogram (based on the DoS residues predicted by 
KINspect), we have provided an alternative evolutionary expla- 
nation of the human kinome, which we argue, more accurately 
reflects functional diversity and specificity evolution. Such a 
view, of proteins evolving new specificities by diverging at 
specific sites within protein domains, is supported by other 
recent studies conducted on bacterial signaling networks (Capra 
et al., 2012; Skerker et al., 2008). 

Kinase Specificity, Regulation, and Activity Are Loaded 
onto Different Residues 

With the aim of interpreting our results from a more global 
perspective, we investigated to what extent the DoS residues 
identified by KINspect can interplay with residues known to be 
involved in the catalytic activation and regulation of the kinase 
domain. 

Two independent sets of residues playing such crucial roles 
have been identified forming hydrophobic interactions at the 
core of the domain and stabilizing the active conformation of 
the domain (Kornev et al., 2006, 2008). These two networks of 
residues, critical for activation and regulation, are named the cat- 
alytic and regulatory spines, respectively. In order to examine 
how the DoS interact with the two spines (Figure 5B), we visual- 
ized the residues forming the catalytic and regulatory spines as 
well as those identified as DoS in the same kinase structure (Fig- 
ure 5C). This representation shows that both groups are virtually 
mutually exclusive, with kinase domain residues belonging to 
either spines or the DoS set (mostly localized on the surface of 
the domain), but rarely both. 

Despite this apparent separation of biological functions in the 
kinase domain, it is at the same time equally important to high- 
light that KINspect, in agreement with previous observations 
(Nolen et al., 2004), identifies the activation segment as playing 
a critical role in specificity. Since this segment also plays a 
crucial role in regulation and catalysis by stabilizing the R-spine 
(Kornev et al., 2006, 2008), in spite of the apparent general de- 
coupling of these different functions, on this particular segment, 
they still appear to be partially intertwined (Figure 5C). Moreover, 
highlighting the distinct evolutionary and functional paths of 
these sets of residues, we could quantify their differences in 



Figure 3. Computational and Experimental Validation of the DoS Identified by KINspect 

(A) Scatterplots comparing pairwise relationships between kinases’ domain sequences, and their specificity profiles can illustrate the lack or existence of 
correlation between sequence and specificity. By limiting the comparison to specific sets, one can investigate whether such sets encode for specificity (i.e., 
maintain or increase the correlation), as measured by Spearman’s correlation coefficients. By comparing the correlations obtained from different sets of residues, 
the whole domain on the left, previously reported determinants of specificity in the middle and KINspect scores on the right, we confirm that residues with a high 
KINspect score encode for specificity (e.g., residues scoring above 0.9 lead to very high sequence-to-specificity correlation, with a Spearman’s correlation 
coefficient of 0.69, despite representing only 5.73% of the residues in the kinase domain alignment). Further comparisons with other sets of residues can be found 
in Figure S4. 

(B) Three new candidate determinants of specificity predicted by KINspect, positioned in the first and third residues of the aC helix and seven residues before the 
APE motif delimiting the activation segment, are experimentally verified to encode specificity by PSPL as described in Experimental Procedures. 

(C) Experimental results for the PKCy and PI Ml mutants showing a specificity switch for P+2 and P+1 substrate positions, as shown in matrix and logo form (logos 
generated using Seq2Logo; Thomsen and Nielsen, 2012). Complete PSSMs describing the PSPL results for wild-type and mutant kinases can be found in 

Figure S4. 
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Figure 4. Determinants of Specificity in the Human Kinase Domain 

(A) Mesh representation of the kinase domain, including its secondary structure in cartoon representation and a bound peptide substrate colored in orange. 
Positions predicted as DoS by KINspect (i.e., residues with a KINspect specificity importance score higher than 0.9) are highlighted in cyan and the three “canals” 
formed by these determinants are outlined by red arrows. 



(legend continued on next page) 
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sequence conservation and conclude that DoS are typically 
residues with considerably lower conservation than the highly 
conserved spines and many other residues in the domain 
(Figure 5D). 

Similarly Sparse Networks of Determinants Drive 
Specificity in the SH2 Domain 

To investigate the generality of these observations, we explored 
DoS patterns in another signaling modular protein domain, 
namely, the SH2 domain. Following a very similar approach 
as described for the kinase domain, and after identifying the 
required parameters (Figure S6) appropriately, KINspect identi- 
fied several SH2 residues that are likely involved in peptide spec- 
ificity (Figure 6; Movie S2). 

Being a smaller domain of typically approximately 100 resi- 
dues (as can be appreciated in the SH2 domain alignment in 
Data S4) and generally showing less variability in peptide spec- 
ificity, it is perhaps not surprising that KINspect converged 
considerably faster for the SH2 domain (Figure S6) than in the 
case of the kinase domain. 

Despite this difference, as with the kinase domain, indepen- 
dent deployments of KINspect led to the highly reproducible re- 
sults (Figure S6), and the general model of peptide specificity 
observed in the kinase domain, where a sparse network of 
DoS involving a relatively larger number of residues, was also 
observed in the case of the SH2 domain (Figure 6; Data S5). Simi- 
larly, whereas some DoS were close to the peptide (e.g., Figures 
6C, 6D, and 6G), others were relatively far away from it (e.g., 
Figures 6E and 61 ), though often connected by inter-residue 
“canals.” The aforementioned control experiments, where uni- 
form and randomized domain-specificity sets were used (Fig- 
ure S3), exclude the possibility that the similarities between 
these results for the kinase and SH2 domains emanate from 
some intrinsic bias in our computational approach. The spatial 
representation for several of our DoS is also supported by previ- 
ous studies of SFI2 domains (Halabi et al., 2009; Lenaerts et al., 
2008). All in all, this suggests that our findings, with a high 
number of DoS residues located away from the substrate, 
far from being unique to kinase specificity could be a more 
general trend applicable to other modular protein domains 
(Tompa et al., 2014). 

DISCUSSION 

Despite the crucial importance of signaling fidelity in biological 
organization and cellular responses to environmental cues, our 
perception of how peptide specificity is encoded in the kinase 
domain has been highly fragmented and biased toward certain 
kinase families, non-human species, or a subset of kinase 
domain residues (e.g., those close to the peptide substrate). 
Here, we developed a data-driven systematic approach to inves- 
tigate the presence of DoS residues throughout the human ki- 
nome, experimentally validated several of these DoS, which 



together with those shown in the accompanying article (Creixell 
et al., 2015) encode specificity for the five residue positions most 
critical for specificity in the peptide substrate (P-3, P-2, P0, P+1 , 
P+2), and identified a distributed, but interconnected, network of 
DoS in different parts of the kinase domain. In contrast to previ- 
ous studies, our results suggest specificity is driven by a larger 
number of residues and a more distributed network of typically 
non-conserved sets of residues than previously appreciated 
(Figures 7A and 7B). 

Determinants in the Context of Spines and Sectors 

The sparse networks of DoS also present interesting implications 
when compared and contrasted with previous work. 

First, as mentioned earlier and illustrated in Figure 5, we note 
an apparent discrepancy between the residues we identify as 
DoS, mostly localized on the surface of the domain, and the 
core residues that form the catalytic and regulatory spines (Kor- 
nev et al., 2006, 2008). Whereas this suggests some degree of 
functional and evolutionary separation between catalytic activity 
(and regulation thereof) and peptide specificity, a separation of 
functions that is similar to those employed in other signaling 
systems (Goldman et al., 2014), our results also indicate that 
the activation segment provides a link between these biological 
functions. The fact that different functions seem to be “co- 
loaded” on this segment could explain why a large fraction of 
cancer mutations perturb this critical part of the kinase domain 
(Dixit et al., 2009; Creixell et al., 2015). 

Moreover, this separation of function, together with our finding 
of very different evolutionary speeds and trajectories for spines 
and DoS, makes us speculate that kinases have evolved within 
tight constraints around spines, where maintaining spine integ- 
rity was critical to retain kinase activity. On the other hand, the 
more loose constraints on DoS have facilitated the evolution of 
new kinases with distinct specificities, a view that is consistent 
with the current understanding of the evolution of signaling sys- 
tems (Lim and Pawson, 2010). 

Furthermore, the picture portrayed by our results of sparse net- 
works of multiple residues driving specificity together would fit 
within the scope of more recent theories on protein function, 
namely, the so-called protein sector model. According to this 
model, protein function is often encoded in protein sectors, 
defined as subsets of co-evolving residues (Halabi et al., 2009; 
Lockless and Ranganathan, 1999) identified in different protein 
domains, which often also include long-range interactions be- 
tween distant residues by allosteric regulation (Reynolds et al., 
2011). Our results suggest that similar mechanisms could be at 
work determining specificity in both the kinase and SH2 domains. 

Perspectives 

Despite the significant conceptual and analytical leap forward 
provided by KINspect in terms of capability and coverage, 
continued experimental and computational advances will make 
it even more precise and accurate in the future. 



(B-K) For a more clear representation of different parts of the structure, longitudinal (B-F) and transversal (H— K) slices were taken through the kinase domain at 
the planes indicated in the inset of (A). A dynamic visualization of this structure can be found in Movie SI . The structure used is that of Akt/PKB in complex with 
GSK3 peptide (PDB ID: 106K; Yang et al., 2002), and the structural visualization on this and other subsequent figures was generated using Chimera (Pettersen 
et al., 2004). 
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From an experimental perspective, it is clear that obtaining 
peptide specificity profiles for a larger number of kinases 
(currently, the percentage of kinases for which their specificity 
has been profiled is only about 30% of the whole human kinome) 
will only improve our method’s results. 

In terms of extending to other applications and expanding 
our current approach, KINspect’s methodology could potentially 
be applied to several other fundamental biological questions 
such as the identification of residues driving kinase inhibitor 
binding and specificity. Naturally, we also plan to expand 
KINspect to add new peptide-recognizing modular domains 
other than the already-included kinase and SH2 domains (e.g., 
SH3 or WW domains) or even include inter-positional depen- 
dencies within the substrate peptide in the future when data 
become available. 

Implications for Evolution and Disease 

As introduced earlier, peptide specificity is a crucial component 
of a wider cellular requirement, signal fidelity, which ensures 
that cells will correctly decode input cues and respond accord- 
ingly. Changes in this system have been identified as playing a 
critical role in multicellular metazoan evolution (Tan et al., 2009, 
2011), but also, at the domain level, in how proteins evolve new 
specificities allowing cells to start responding to new cues or un- 
fold new responses to them (Capra et al., 201 2; Marengere et al., 
1994; Skerker et al., 2008; Zarrinpar et al., 2003). While this has 
perhaps been less studied in a disease context, it has been sug- 
gested that the same process occurs in cancer (Borrello et al., 
1995; Santoro et al., 1995; Songyang et al., 1995). In the accom- 
panying article (Creixell et al., 2015), we utilize the bona fide 
DoS described here to identify cancer mutations perturbing 
them and experimentally validate their role in causing signaling 
rewiring (Creixell et al., 2012) and thus contributing to oncogen- 
esis by affecting kinase specificity. We are optimistic these 
mutations, and new ones that will be identified in the future, will 
constitute a novel and solid foundation for enhanced apprecia- 
tion of how signaling networks are perturbed in cancer and other 
diseases. 

EXPERIMENTAL PROCEDURES 
Learning Classifier System 

The learning classifier system briefly described in the main text that constitutes 
the computational engine behind KINspect is illustrated in Figure 2. Further 



algorithmic and mathematical details can be found in Supplemental Experi- 
mental Procedures. 

Frobenius Distance between Matrices or Vectors 

As a measure of dissimilarity between matrices or vectors, the Frobenius 
distance or norm can be simply calculated as the square root of the difference 
between every value in the two matrices or vectors squared (Ellis and Kobe, 
2011 ). 

Domain Information and Alignments 

Domain sequences for all human kinase domains and additional information 
on the human kinome were obtained from the http://kinase.com/ repository, 
with more recent and up-to-date unpublished data kindly provided by 
Dr. Gerard Manning (G. Manning, personal communication; Manning et al., 
2002). Similar sequence and domain information was obtained for all the 
human SH2 domains from the SFI2 domain site (Liu et al., 2006). Sequences 
were aligned using ClustalW2 (Larkin et al., 2007), and alignments were further 
refined manually with help from Dr. Toby Gibson (EMBL). 

Dendrogram Construction 

Distance matrices between kinases were computed using BLOSUM62 substi- 
tution matrix (Henikoff and Henikoff, 1992). The distances in the kinome tree 
are based on all the columns in the alignment, while the distances in the spec- 
ificity tree only consider the selected DoS columns in the alignment. We used 
neighbor joining to build both trees. 

Computing Minimum Distance to Substrate from PDB Files 

In a similar manner as described in the accompanying article (Creixell et al., 
201 5), we computed a measure of the minimum distance between any position 
in our alignment and the substrate peptide. This distance was obtained by 
extracting distance information from ten representative kinase-substrate 
structures deposited in PDB (AKT2 [PDB ID: 106K]; Yang et al., 2002, PIM1 
[PDB ID: 2BZK]; Bullock et al., 2005, DYRK1A [PDB ID: 2W06]; Soundararajan 
et al., 2013, CDK2 [PDB ID: 2CCI]; Cheng et al., 2006, PAK4 [PDB ID: 2Q0N]; 
Chen et al., 2014, EPHA3 [PDB ID: 3FXX]; Davis et al., 2009, FES [PDB ID: 
3CD3]; Filippakopoulos et al., 2008, EGFR [PDB ID: 2GS6]; Zhang et al., 
2006, IGF1R [PDB ID: 1K3A]; Favelyukis et al., 2001, INSR [PDB ID: 3BU3]; 
Wu et al., 2008). By developing and deploying in-house python scripts that 
utilize the biopython package Bio. PDB, we could extract distance features 
between every residue of these kinase-substrate pairs. Subsequently, this 
information was collected and, by using the alignment to track the same 
position on different kinase-substrate structures, the minimum distance for 
each alignment position was obtained. Additional information on substrate 
peptide distance for the different mask positions can be found in Data S3. 

PSPL Analysis 

PKCy (WT and mutant) was produced in HEK293T cells with a 3 x FLAG 
epitope tag at the C terminus and isolated by affinity purification on M2 
FLAG antibody resin (Sigma-Aldrich) as described (Mok et al., 2010). Piml 
(WT and mutant) was expressed as an N-terminally hexahistidine-tagged 



Figure 5. Evolutionary Aspects of DoS and Their Co-existence with Kinase Spines 

(A) As can be observed from the different panels on this DoS-based Dendrogram, where several kinases are localized discordantly with whole-domain evolution, 
peptide specificity evolution cannot be directly inferred from whole-domain specificity. These differences highlight how kinases have accumulated mutations on 
these specific residues, i.e., DoS, in order to evolve different specificities. For further explanation and information, please refer to Experimental Procedures and 
Figures S5. 

(B) We next investigated how DoS co-evolved with residues involved in structural changes related to catalysis (kinase spines). As can be seen here, there are 
different possible degrees to which DoS and spines could co-exist, ranging from complete overlap (left) to complete exclusion (right). In (C), we investigate which 
of these models is more supported by our data. 

(C) By comparing the relative localization of the DoS (top-left structure) together with the residues belonging to the catalytic spine (in yellow, bottom-left structure), 
the regulatory spine (in red, top-right structure) or all residues together (bottom-right structure), our data suggest that the subgroups of residues that are DoS or 
spines are mutually exclusive or, in other words, that residues classified as DoS are not part of the catalytic or regulatory spines. Like in Figure 4A, the structure 
used is that of Akt/PKB in complex with GSK3 peptide (PDB ID: 106K; Yang et al., 2002). 

(D) Evolutionary conservation for the different subsets of residues (whole domain, DoS, C-spine, and R-spine) was computed as the negative of entropy, 
using AL2CO algorithm with its default parameters (50), and shown to be significantly lower in DoS compared to the whole domain and the spines (p = 0.014 
and p = 1.4 x 10 -6 using Wilcoxon test, respectively). 
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Figure 6. Determinants of Specificity in the Human SH2 Domain 

(A) Mesh representation of the SH2 domain, including its secondary structure in cartoon representation and a bound peptide substrate colored in orange. 
Positions predicted as DoS by KINspect (i.e., residues with a KINspect score higher than 0.9) are highlighted in cyan. 

(B-l) As in the case of the kinase domain, longitudinal (B-E) and transversal (F-l) slices were taken through the SH2 domain at the planes indicated in the inset in 
(A). For a dynamic visualization of this structure, please refer to Movie S2. The structure used is that of SAP in complex with SLAM peptide (PDB ID: 1 D4T; Poy 

et al„ 1999). 



fusion protein in E. coli and purified from lysates using TALON resin (Clontech). 
Peptide library analysis was performed by arraying a set of 182 peptide mix- 
tures (50 [xM) in a 1 ,536-well plate in kinase reaction buffer (2 ^il/well). Buffer 
for Piml reactions was 50 mM HEPES (pH 7.4), 10 mM MgCI 2 , 0.1% Tween 
20, and buffer for PKCy reactions was 50 mM Tris-HCI (pH 7.5), 10 mM 
MgCI2, 1 mM DTT, 0.1 % Tween 20 containing a 5-fold dilution of lipid activator 
(EMD Millipore). Peptides had the sequence Y-A-X-X-X-X-X-S/T-X-X-X-X-A- 
G-K-K-biotin, in which X positions were generally an equimolar mixture of 
the 17 amino acids excluding Ser, Thr, and Cys, and S/T is an even mixture 
of Ser and Thr. In each well of the array, the peptide had one of the 20 amino 
acids fixed at one of the nine X positions. In addition, two peptides were 



included that fixed either Ser or Thr at the phosphoacceptor position. Reac- 
tions were initiated by adding kinase (to 8 i^g/ml) and [y- 33 P]ATP (50 ^iM at 
0.03 ^Ci/[xl), incubated 2 hr at 30°C, and then 200-nl aliquots were transferred 
to a streptavidin membrane (Promega). Membranes were washed and dried 
as described and exposed to a phosphor screen. Radiolabel incorporation 
into each peptide mixture was quantified by phosphor imaging using 
QuantityOne software (Bio-Rad). Following background subtraction, data 
were normalized so that the average value for a given position within the 
peptide was equal to 1 . Normalized data from two (PKCy) or three (Piml) sepa- 
rate runs were averaged, log2 transformed, and converted to heatmaps in 
Microsoft Excel. 
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SUMMARY 

Cancer cells acquire pathological phenotypes 
through accumulation of mutations that perturb 
signaling networks. However, global analysis of 
these events is currently limited. Here, we identify 
six types of network-attacking mutations (NAMs), 
including changes in kinase and SH2 modulation, 
network rewiring, and the genesis and extinction of 
phosphorylation sites. We developed a computa- 
tional platform (ReKINect) to identify NAMs and 
systematically interpreted the exomes and quantita- 
tive (phospho-)proteomes of five ovarian cancer cell 
lines and the global cancer genome repository. We 
identified and experimentally validated several 
NAMs, including PKCy M501I and PKD1 D665N, 
which encode specificity switches analogous to the 
appearance of kinases de novo within the kinome. 
We discover mutant molecular logic gates, a drift 
toward phospho-threonine signaling, weakening of 
phosphorylation motifs, and kinase-inactivating 
hotspots in cancer. Our method pinpoints functional 
NAMs, scales with the complexity of cancer ge- 
nomes and cell signaling, and may enhance our 
capability to therapeutically target tumor-specific 
networks. 

INTRODUCTION 

Since the discovery of the first oncogene, Src (Stehelin et al., 
1976), and tumor suppressor, Rb (Friend et al., 1986), more 
than three decades ago, our understanding of some of the 
specific genetic aberrations supporting cancer progression has 
steadily risen. Recent advances in next-generation sequencing 
technologies have led to the identification of large numbers of 
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somatic cancer mutations through whole genome and exome 
sequencing of tumors. Given how complex it is to assess the 
relevance of this enormous repertoire of reported somatic can- 
cer mutations (currently running in excess of 1 million variants) 
(Forbes et al., 2011), the discovery of new somatic mutations 
has vastly outpaced our ability to unravel their functional roles 
(Figure SI A). 

Despite the fact that alterations to the physiological cellular 
responses to environmental cues are fundamental hallmarks of 
cancer cells (Hanahan and Weinberg, 2000) and that cellular 
responses to input cues are driven by signaling networks, a 
comprehensive understanding of how mutations perturb these 
networks is still missing. In fact, new conceptual paradigms 
and computational strategies allowing better assessment of 
the intrinsic complexities of cancer cells, such as the integration 
of cancer genomic and proteomic data, have been recently pin- 
pointed as key requirements in the field of cancer research 
(Weinberg, 2014; Yaffe, 2013). Specifically, new approaches 
for decoding mutations that perturb signaling networks (or as 
we term them, “network-attacking” mutations [NAMs]) (Creixell 
et al., 2012a) and the mechanisms by which they may statically 
or dynamically alter these networks will be fundamental in clos- 
ing this gap (Figure SIB) (Yaffe, 2013). Here, we describe and 
validate such a conceptual and computational framework 
capable of identifying, classifying and unraveling the impact of 
numerous predicted NAMs. 

RESULTS 

Classifying Mutations Affecting Signaling Networks 

In order to evaluate whether cancer mutations perturb signaling 
networks, we initially developed a classification system with 
concrete types of NAMs. We divide NAMs into three funda- 
mental classes. 

The first and relatively well-described type of NAM is one 
that disrupts signaling network dynamics by constitutively acti- 
vating or inactivating a protein kinase, thereby maintaining the 
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information flow either “on” or “off” uninterrupted overtime. Ex- 
amples of such “on” mutations are those substitutions that 
mimic activation loop phosphorylations, whereas examples of 
“off” mutations include those that alter catalytically essential 
residues of kinases, or residues in SH2 domains that are critical 
for phospho-tyrosine binding. Since the timely activation and 
termination of signals is critical for the proper cellular homeo- 
stasis as well as phenotypic responses to environmental 
stimuli, such mutations lead to aberrant information processing 
(Figure 1A). 

A second, largely undescribed type of NAMs are those muta- 
tions that shift the signaling network structure by “rewiring” 
upstream or downstream interactions (of the mutated protein 
or node). Upstream rewiring can be caused by mutations in a 
kinase substrate that disrupt the linear motif around a phosphor- 
ylation site, thus causing a new upstream kinase to phosphory- 
late the mutant substrate. Downstream rewiring, in contrast, 
can be caused by drifts in the peptide specificity upon mutation 
of the determinants of specificity (DoS) in kinase (or SH2) do- 
mains (Creixell et al., 2015 [this issue of Ce//]) (Figure 1A). 

Finally, we hypothesized that a third type of NAMs could exist 
where mutations would generate or destroy phosphorylation 
sites, effectively generating new molecular logic gates in cancer 
cells (Figure 1A). 

Node inactivation and node activation would fall within the 
categories of what is traditionally referred to as loss-of-function 
and gain-of-function hypermorphic mutations, while the other 
mutations would fit best within a gain-of-function neomorphic 
classification. 

The ReKINect Methodology 

With the aim of systematically identifying NAMs in phosphoryla- 
tion-based signaling networks, we developed a computational 
approach, ReKINect, capable of predicting these defined func- 
tional mutations (Figures 1A and IB; http://ReKINect. science). 

We began by assembling comprehensive sequence and posi- 
tional information covering all known 538 kinase domains, 1 1 1 
SH2 domains, and 149,838 phosphorylation sites in the human 
proteome (refer to the Experimental Procedures for further 
information). This information facilitated the mapping of NAMs 
onto these domains and the modeling of the likely functional 
effect of mutations (Figure 1 B). Mutations in established or pre- 
dicted functional residues (essential residues on the different 
domains, determinants of specificity identified in our accompa- 
nying paper [Creixell et al., 2015] as well as phosphorylation 
sites) would then be predicted to lead to the dysregulation of 
network dynamics, network rewiring, and gain or loss of phos- 
phorylation sites (Figure IB). 

Below, we provide an overview, further details, and experi- 
mental evidence using a wide range of techniques (including 
genome-specific global phospho-proteomics, peptide speci- 
ficity, or phenotypic data) for the different predictions generated 
by the ReKINect algorithm and explore the impact on signaling 
networks of the NAMs we identify. 

Quantifying NAMs in Cancer Repositories and Cell Lines 

Having defined the different NAMs, we next intended to assess 
their existence and abundance in cancer. We thus collected 



a set of 678,050 unique missense somatic cancer variants 
from COSMIC (version 67) (Forbes et al., 2011) and deployed 
ReKINect on this set to predict a large number of instances 
across the NAM classes (Figure 2). 

In order to experimentally investigate NAMs, we performed a 
global integrative analysis by combining exome next-generation 
sequencing (NGS) and quantitative mass spectrometry (MS)- 
based (phospho-)proteomics on a set of five ovarian cancer 
cell lines (ES2, OVAS, OVISE, TOV-21, and KOC-7C; Figures 
SI and S2) and conducted genome-specific proteomics 
analyses (Experimental Procedures). By following a Spike-in 
SILAC-based labeling strategy (Geiger et al., 2011) (Figures SI 
and S2; Experimental Procedures), we could identify and accu- 
rately quantify on average more than 6,000 unique phosphoryla- 
tion sites across over 2,000 proteins in each of the five cell lines. 
Furthermore, NGS identified close to 9,000 unique missense 
variants per cell line (including SNPs and germline mutations 
as well as somatic mutations) that were subsequently interpreted 
by ReKINect (Figure 2). 

As shown in Figure 2 (and Data SI -S6) ReKINect could identify 
functional mutations covering each class of NAM included in 
our model as well as enrichments in these functional mutations 
(Figure SI). In addition, we computed the frequency at which 
different protein domains are affected by cancer mutations in 
the global repository of somatic cancer mutations as a means 
to provide general estimations of the likelihood of finding 
perturbations in different modular protein domains in cancer 
(Figure SI). 

Given our currently limited knowledge about the different 
processes that can lead to the different NAMs (e.g., phospho- 
mimicking mutations are the only case currently covered by 
ReKINect that result in kinase activation) the number of func- 
tional mutations presented in Figure 2 is most likely a significant 
underestimation. Nevertheless, in the following sections we 
provide further details and evidence supporting the existence 
of these predicted NAMs in cancer signaling networks. 

Genesis and Extinction of Phosphorylation Sites 
and Circuitry 

Having collected both exome sequencing and proteomic data 
on the same set of cancer cell lines, we were able to address 
the question of whether mutations could create new phosphory- 
lation sites or destroy existing ones, thereby generating new 
cancer-associated molecular logic gates within a cancer cell 
signaling circuitry. To identify such events, we specifically 
inquired the global sequencing data for the appearance of phos- 
phorylatable residues resulting from mutations, some of which 
could be experimentally verified to be bona fide sites by mass 
spectrometry. Strikingly, this approach uncovered several ex- 
amples of mutations that lead to the genesis of new phosphory- 
latable sites, which become recognized and phosphorylated 
by kinases (Figures 3A and S2). Among the proteins harboring 
these neomorphic phosphorylation sites were TANC1 and 
HSF1 (Figure 3A). While little is known about TANC1, HSF1 is a 
heat-shock protein previously reported to be associated with 
carcinogenesis and poor prognosis, as well as supporting malig- 
nancy in a variety of cancers (Dai et al., 2007). Thus, further 
investigations of this new phosphorylation site on HSF1 and its 
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Figure 1. Network-Attacking Mutations 

(A) Six distinct types of network-attacking mutations (NAMs) can be defined based on perturbations of signaling network dynamics, network structure, and 
dysregulation of phosphorylation sites. Cancer mutations could generate or destroy molecular logic gates, for example by creating new, or by removing existing, 
phosphorylation sites. Alternatively, mutant proteins could become activated by new upstream proteins (incoming edges) or start perturbing new downstream 
substrates (outgoing edges). Finally mutations could turn signaling proteins (e.g., protein kinases) constitutively “on” or “off.” The effect of these NAMs on the 
cue-signal-output flow of information is illustrated for each comparing the wild-type (WT) and mutant (Mut) cases. 

(B) After mapping mutations at the genomic and proteomic level, every NAM class defined in (A) is modeled on the different protein domains and motifs currently 
included in ReKINect following a distinct procedure: mutations on the essential residues of the kinase and SH2 domains are classified as node inactivating. Acidic 
mutations mimicking the phosphorylated/active state of kinases are classified as node activating. Mutations perturbing phosphorylation motifs and causing 
changes in the upstream kinase phosphorylating the target protein are classified as upstream rewiring. On the other hand, mutations in residues that determine 
specificity of the kinase or SH2 domains (Creixell et al., 201 5) perturb domain specificity and are classified as downstream rewiring. Finally, our genome-specific 
MS experiments enable the identification of mutations generating phosphorylatable residues or the extinction of phosphorylation sites by mutating away from 
phosphorylatable residues. 

See also Supplemental Experimental Procedures. 
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Figure 2. Overview of NAMs in Cancer Cell Lines and in the Global Repository of Cancer Somatic Mutations as Predicted by ReKINect 

For each cell line and for the global repository of cancer somatic mutations we show the number of unique missense variants and how many of these variants fall 
within kinase proteins, SH2 proteins or phosphorylation sites (using a five-residue flanking region window surrounding the phosphorylation site). From these 
we then illustrate the fraction of variants falling within the respective domains and the fraction that can be interpreted by ReKINect. In the case of ES2, all of the 
27 variants hitting an SFI2 protein, hit outside SH2 domains, thus ReKINect could not make any predictions as to their effect (ghosted). It should be noted that 
the genesis of phosphorylation sites cannot be predicted from in silico analysis alone but require genome-specific-MS experiments. See also Figure SI. 



predicted cell-cycle-dependent upstream kinase, CDK2, may 
lead to new insights on the role of this heat-shock protein in can- 
cer (Figure 3A). 

In order to discover NAMs destroying phosphorylation sites, 
we combined our exome sequencing data with those from the 
quantitative mass-spectrometry analysis of the phospho-pro- 
teomes of the five ovarian cancer cell lines. This enabled us 
to perform genome-specific searches of the mass-spectra, in 
order to identify direct proteomic evidence of the destruction 
of phosphorylation sites (Figure 3B) by identifying the mutated 
but unmodifiable peptides. This approach enabled us to identify 
380 variants in our five cell lines and 6902 in the global repository 
of cancer mutations destroying phosphorylation sites (Experi- 
mental Procedures and Figure 2). 

Two such events from the cell lines illustrated in Figure 3B, are 
RAB1 1 FIP1 (T281 M) and TNKS1 BP1 (SI 533G). Whereas the role 
of RAB11FIP1 in cancer is not as clear, Tankyrase-1 -binding 
protein (TNKS1BP1) binds Tankyrase, which in turn, associates 



with TRF1 protein at the telomeres. This complex is not only 
tightly regulated during cell-cycle progression but critically it 
regulates telomere length by binding on the double-stranded 
TTAGGG repeat of telomeres. This, together with the fact 
that Aurora Kinase B (AurKB), a key cell-cycle mitotic kinase 
(Alexander et al., 2011), is predicted by NetworKIN (Linding 
et al., 2007) to phosphorylate the wild-type form of TNKS1 BP1 , 
suggests a potential role in cell-cycle and telomere length dysre- 
gulation for this mutant variant. 

In order to provide further characterization and assess the 
phenotypic impact of mutations resulting in genesis and 
destruction of phosphorylation respectively, we performed 
siRNA-based knockdown experiments of both TANC1 and 
RAB11FIP1 across the five cell lines. While knockdown 
effect could certainly be attributable to many other factors 
besides these specific mutations, surprisingly, as shown in 
Figure S3 and detailed in the Supplemental Experimental Pro- 
cedures, we indeed observed phenotypic effects supporting 
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Figure 3. NAMs Leading to Genesis and Extinction of Phosphorylation Sites 

(A) Two examples of network-attacking mutations generating new phosphorylation sites on HSF1 and TANC1 , as evidenced by exome sequencing data and MS 
spectra matching the phosphorylated mutation. 

(B) Two examples of network-attacking mutations causing the extinction of known phosphorylation sites on RAB1 1 FIP1 and TNKS1 BP1 , supported by exome- 
sequencing data and MS spectra matching the unphosphorylatable mutated residue. 

See also Figures S2 and S3. 
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the most parsimonious expectations arising from ReKINect’s 
predictions. 

Above, we aimed to provide the most accurate and evident 
instances of NAMs that generate and destroy phosphory- 
lation sites and achieved this through integration of exome- 
sequencing and MS experimental data following stringent 
selection criteria (Experimental Procedures). Thus, we speculate 
that many more NAMs leading to the genesis and extinction of 
molecular logic gates will undoubtedly exists. 

Kinase Downstream Rewiring 

Next, in order to explore if cancer mutations can hit residues 
that determine kinase specificity (determinants of specificity 
[DoS]) and thereby impose downstream rewiring, we included 
the results from the KINspect algorithm, described in the accom- 
panying article (Creixell et al., 2015), in the ReKINect platform. 
Sourcing from the global repository of cancer-associated 
somatic mutations we could predict a large set of putative 
NAMs leading to downstream rewiring (Experimental Proce- 
dures; Table SI). 

Following a prioritization procedure described in the Supple- 
mental Experimental Procedures, we compiled a ranked list of 
cancer somatic mutations with the highest potential to cause 
downstream rewiring (Table SI). The list includes 1,871 unique 
missense mutations predicted to alter determinants of specificity 
by hitting the kinase domain residues most likely to play 
significant roles in specificity (specificity score higher than 0.9). 
Even with maximum stringency filters and focusing on the single 
kinase position most likely to drive specificity (highest specificity 
score of 1 .0, previously reported by the literature as a determi- 
nant of specificity [Brinkworth et al., 2003] and in direct physical 
contact with the substrate with a distance of <3 A), we identified 
42 unique missense mutations on this specific position covering 
all branches of the human kinome tree (Table SI). 

As detailed in the Supplemental Experimental Procedures, 
identifying the cases more suitable to experimental validation 
narrowed our candidates down to mutations on three positions 
in direct contact with the substrate and high KINspect score 
(Creixell et al., 2015) leading to the cloning, expression, and 
purification of these mutant kinases as well as their wild-type 
variants (Figures 4A-4E). First, we purified the two PKCy mu- 
tants, D484G and M501I, predicted to perturb the determinants 



of specificity in alignment positions 651 and 995, respectively 
(Figures 4A and 4B). Since the determinant of specificity per- 
turbed by the mutant variant D484G was located four residues 
downstream of the conserved FIRD motif on the kinase domain, 
we named this determinant as HRD+4 (Figure 4A). Given this 
spatial location and proximity to the P-2 position of the sub- 
strate peptide, we predicted this first mutant would affect P-2 
specificity. In contrast, the mutant variant M501I was found 
immediately downstream of the conserved DFG motif within 
the kinase activation loop (DFG+1), a residue for which there is 
recent evidence for its role driving serine-threonine specificity 
at the phosphorylation site (P0 i.e., central S/T(/Y) residue) posi- 
tion (Chen et al., 2014). As shown in Figures 4B and S4, experi- 
mental determination of the peptide specificity of both variants 
by positional scanning peptide library (PSPL) (FHutti et al., 2004) 
corroborated the specificity drift of both these mutants. In the 
case of the variant PKCy D484G, our results uncovered a loss 
of Arg preference in position P-2 of the substrate peptide for 
the mutant variant (Figures 4B and S4). As predicted in the 
case of the variant M501 1, PSPL results demonstrated a change 
in phosphoacceptor residue preference from Ser to Thr (Fig- 
ure 5B). This specificity “switch” was further confirmed by 
performing phosphorylation assays on both the wild-type and 
mutant variants using a pair of matched peptide substrates of 
identical sequence save for having Ser or Thr in the P0 position 
(Figure 4C). As seen with PSPL analysis, WT PKCy preferred 
Ser over Thr, while the M501 1 mutant by contrast phosphorylated 
the Thr peptide most efficiently. Given that PKCy is a critical 
regulator of migration in development (Kramer et al., 2002), 
that it has been linked to metastasis (Yang et al., 2014), and 
that its overexpression in epithelial cells triggers a malignant 
phenotype and tumorigenic behavior in vivo (Mazzoni et al., 
2003), we speculate that these specificity drifts ReKINect has 
predicted could provide tumorigenic, invasive, and metastatic 
capabilities to cancer cells. While these PKCy mutants were 
identified in lung cancer samples (Kan et al., 2010) wild-type 
PKCy is typically expressed only in the brain (Sundram et al., 
2011). Interestingly, PKCy was overexpressed in the tumor 
bearing the M501 1 mutation (Figure 4D) to levels substantially 
higher than in tumors where this genomic region had been 
amplified (as reported by cBioPortal [Gao et al., 2013]). A recent 
report highlighted loss-of-function mutations on PKC kinases 



Figure 4. NAMs Causing Downstream Rewiring 

(A) Three positions in direct contact with the substrate peptide, named aDI, HRD+4, and DFG+1, and likely involved in determining specificity for substrate 
positions P-3, P-2, and P0 (i.e., the phospho-acceptor site), respectively, harbor several cancer somatic mutations, three of which were selected for experi- 
mental validation. 

(B) Experimental validation by position scanning peptide library (PSPL) array of the specificity drift caused by downstream rewiring NAMs. Heat maps show 
normalized, averaged data from two independent experiments illustrating the specificity drift for the cancer variants PKD1 D665N and PKCy D484G and M501 1 
in substrate positions P-3, P-2, and P0, respectively. The results are also shown in logo form plotting the normalized information content in the wild-type 
and mutant specificity switch position (logos generated using Seq2Logo [Thomsen and Nielsen, 2012]). 

(C) The P0 specificity switch of the PKCy variant M501 1 was subsequently confirmed by quantifying the phosphorylation rate of identical peptide substrates 
containing either Ser or Thr at the phosphorylation site position (RRRRRSWYFGG and RRRRRTWYFGG) by mutant and wild-type kinase variants. The graph 
shows the mean ± SD (n = 4). 

(D) PKCy expression levels are markedly increased in the tumor sample harboring the PKCy M501 1 downstream rewiring mutation. 

(E) Comparison of the differences in substrate specificity typically observed between wild-type human kinases (gray histogram) and those mutant kinases 
reported here (black arrows). As evident from the plot, in two out of the three cases, the magnitude of the specificity drift caused by the cancer mutations is 
comparable to the specificity difference existing between different wild-type kinases. 

See also Figure S4. 
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Figure 5. NAMs Causing Upstream Rewiring 

(A) Upstream rewiring mutations will cause a new kinase (from K1 to K2) to phosphorylate the mutant protein (S, red). By plotting the probability of both kinases to 
phosphorylate the wild-type and mutant variants of the protein, we can visualize, quantify, and compare different upstream rewiring mutations. 

(B) The rewiring power and the rewiring angle can be computed by considering the necessary trajectory that the mutation causes (from the “origin” right-bottom 
triangle to the “destination” left-top triangle). The rewiring power is equivalent to the magnitude of the vector and measures the rewiring capacity of the mutation. 
The rewiring angle is the angle of the vector from the diagonal and distinguishes whether the rewiring effect is mainly driven by kinase resignation (i.e., a loss of 
phosphorylating ability of the wild-type kinase, angle >45°), depicted in blue, or by kinase take-over (i.e., an increase of phosphorylation ability of a new kinase, 
angle <45°), depicted in green. The three examples illustrate how three different mutations (A-C) can lead to different outcomes, such as the same rewiring power 
but different main driving force (A and B) or the same driving force but different magnitude (B and C). 

(C) Illustration of the two main driving processes that cause upstream rewiring, namely the reduced ability of the original kinase to phosphorylate the new mutant 
substrate variant (resignation) and the increased ability of a second kinase to phosphorylate the mutant substrate protein (take-over). 

(D) Representation of all the upstream rewiring mutations identified in the global repository of somatic mutations at different distances relative to the phos- 
phorylation site (from five residues before a phosphorylation site, P-5, to five residues after a phosphorylation site, P+5). Rewiring events mainly driven by 
resignation are shown in blue and those mainly driven by take-over are shown in green. 



(legend continued on next page) 
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(Antal et al., 2015), including PKCy. By having altered substrate 
specificity, the two PKCy mutants characterized here are likely 
to both lose the ability to phosphorylate some endogenous sub- 
strates while gaining the capacity to phosphorylate new de novo 
substrates. 

Next, we purified the mutant variant of PKD1 predicted to 
perturb the determinant of specificity in alignment position 
494 (Figures 4A and 4B) that we named aDI , given its location 
on the first residue of a helix D of the kinase domain. As with 
the PKCy mutants, the PSPL experiments validated the speci- 
ficity drift in PKD1 D665N (Figures 4B and S4). Specifically, this 
mutation causes loss of an essential feature of the WT kinase 
phosphorylation signature, namely selectivity for Arg at the 
P-3 position. An Arg residue is found at the P-3 position in 
critical targets of PKD1, including CREB, SSFI1L, FIDACs 5 
and 7, HPK1, MARK2, and HSP27. This variant is therefore 
expected to perturb signaling downstream of PKD1, a kinase 
with roles in the development and metastatic progression of 
several cancers including prostate, breast, gastrointestinal, 
pancreatic, and skin cancers (Sundram et al., 2011). Having 
made this prediction for a mutation originating from a prostate 
cancer sample (Lindberg et al., 2013), potential deregulation 
between PKD1 and its substrate HSP27 is particularly notable, 
as its phosphorylation is closely related to androgen receptor 
function in prostate cancer (Hassan et al., 2009; Sundram 
et al., 2011). In addition to breaking these interactions in 
the signaling network, because the D665N mutation renders 
PKD1 a less specific kinase, we anticipate that the mutation 
will generate many new connections through phosphorylation 
of non-native substrates, some of which may contribute to 
the malignant phenotype. 

We next assessed the magnitude of the specificity switches 
caused by these cancer mutations, by comparing the wild-type 
to mutant drift in specificity to the specificity differences 
observed between wild-type human kinases across the kinome. 
As shown in Figures 4E and S4, two out of the three downstream 
rewiring mutations cause a specificity drift of a magnitude com- 
parable to the specificity difference that exists between different 
wild-type human kinases. Effectively, this implies that a single 
cancer mutation can lead to a specificity switch that is analogous 
to a new kinase appearing in the genome. 

With these validated examples at hand, we set out to further 
investigate whether other cancer mutations could cause similar 
dramatic specificity drifts and switches in other human kinases. 
By analyzing predictions from ReKINect based on cancer muta- 
tions identified to hit validated DoS residues (Table SI), in many 
cases with amino acid substitutions analogous to the ones we 
experimentally tested above, we could indeed identify additional 
cancer mutations that with high likelihood cause downstream 
rewiring. In the case of the HRD+4 site, 41 additional cancer 
mutations were identified substituting this site to multiple other 
residues (Table SI). 



Moreover, in addition to the PKCy M501 1 mutant, we could 
identify 29 other cancer mutations hitting the DFG+1 site, eight 
of which with analogous substitutions of large hydrophobic res- 
idues with p-branched aliphatic residues (Haspin L669I, DDR1 
M793I, ITK M503V, TRKA M671T, IRAK3 M314I/M341V/ 
M341T, and BRAF L597V), the type of substitution that most 
likely leads to a specificity switch from a preference for phos- 
phorylating Ser to Thr (Figure S4; Table SI). In contrast, no 
mutant was found that would perturb specificity in the opposite 
direction (from Thr to Ser phosphorylation preference; Figure S4). 
Thus, it appears there is a general trend toward increased phos- 
phoThr-driven signaling in cancer. 

Of these 29 mutants, the identification of a likely mechanism of 
action for BRAF L597V is of critical relevance as it is not only a 
germline mutation in Noonan syndrome and cardio-facio-cuta- 
neous syndrome, but also plays a significant role in the develop- 
ment of cancer when acting in epistatic synergy with Ras G12V 
(Andreadi et al., 2012; Davies et al., 2002). While the molecular 
mechanisms of this epistatic interaction could potentially be 
linked to changes in BRAF dimerization, our results suggest 
that Ras G12V could ensure the hyperactivity of this signaling 
network, whereas BRAF L597V rewires it by a drift in BRAF’s 
kinase specificity. Such a scenario is reminiscent of previous 
interactions between different mutations promoting cancer 
development in a synergistic manner (Creixell et al., 2012a; Wu 
et al., 2010). Finally, we could identify 40 cancer mutations in 
addition to the PKD1 D665N mutation perturbing the aDI site, 
eight of which containing the same amino acid substitution D 
to N (PKCb D427N, TSSK1 D97N, TTBK1 D116N, CDKIIb 
D507N, CDK8 D103N, PFTAIRE1 D198N, PDGFRa D681N, and 
STYK1 D201 N) and thereby constituting high-confidence down- 
stream rewiring mutants (Figure S4; Table SI). 

Altogether, these results represent the discovery of three new 
downstream rewiring mutations on three distinct determinants of 
specificity (HRD+4, DFG+1 , and aDI) and show that single-point 
NAMs can drive downstream rewiring of a magnitude that is 
analogous to a new kinase suddenly appearing within the human 
kinome. They also suggest that the prioritized collection of 
mutations we provide is likely to contain even more cancer mu- 
tations causing rewiring (1 6 of which being clear high-confidence 
candidates, Figure S4). 

Upstream Kinase Rewiring 

Complementary to the downstream rewiring NAMs, we next 
investigated whether mutations could also cause upstream re- 
wiring (i.e., when a substrate is, due to the impact of a mutation, 
being phosphorylated by different upstream kinases) by perturb- 
ing phosphorylation motifs on the substrate (Figure 1). By 
analyzing mutations that fall within 5 flanking residues of known 
phosphorylation sites (see Experimental Procedures) with the 
NetPhorest (Miller et al., 2008) and NetworKIN (Linding et al., 
2007) algorithms on the wild-type and mutant variants of the 



(E) Quantification of the percentage of mutations leading to upstream rewiring depending on their position relative to the phosphorylation site. 

(F) Assessment of the median magnitude of rewiring for mutations based on their position relative to the phosphorylation site. 

(G) The median rewiring angle (orange and yellow bars) and the ratio of take-over over resignation rewiring mutations (gray line) conditioned on the position of the 
mutation relative to the phosphorylation site. 

See also Figure S5. 
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same protein, we could predict the likely upstream rewiring ef- 
fects of mutations on substrates. As detailed in Figure 5 (see 
also Figure S5 and Tables S2 and S3), for a given predicted re- 
wiring event (i.e., where the upstream kinase predicted for the 
wild-type and mutant variants of the substrate is non-identical) 
we defined two variables termed “rewiring power” and “rewiring 
angle” based on the predicted probability for the most likely up- 
stream kinase in the wild-type and mutant substrate variants 
(Figures 5A and 5B). 

The rewiring power measures the magnitude of the rewiring 
event, by accounting for the loss of phosphorylation potential 
of the old upstream kinase as well as the gain in phosphorylation 
potential of the new upstream kinase. The number of rewiring 
events and their rewiring power showed a non-uniform distribu- 
tion where, generally, mutations closer to a phosphorylation 
site have a higher chance of causing upstream rewiring and 
the rewiring event itself will be of higher magnitude (rewiring 
power) (Figures 5D-5F). This global trend is observed for all po- 
sitions except for the position just before the phosphorylation 
site, P-1, where mutations are less likely to lead to rewiring 
events and will most often be of lower magnitude. In fact, such 
distribution with the singularity of P-1 resembles the positional 
distribution of information content of kinase substrate specificity 
(Figure S5), underlining a fundamental link between the criticality 
of a given position for substrate recognition by upstream kinases 
and the disruptive potential of cancer somatic mutations hitting 
those positions. In other words, positions critical to and in direct 
close contact with the phosphorylating kinase (e.g., P+1, P+2, 
P-2, or P-3, as opposed to P-1 that makes very few contacts 
with the kinase) (Brinkworth et al., 2003) are far more likely to 
harbor strongly rewiring mutations. 

The repertoire of potential upstream rewiring events allowed 
us to address the central question of whether rewiring is most 
often driven by an increased phosphorylation propensity for 
the mutant substrate variant by a new kinase (which we term 
“kinase take-over”) or, inversely, if caused by a reduced pro- 
pensity for the original kinase upon mutation (which we term 
“kinase resignation”) (Figure 5C). The rewiring angle does, in 
effect, measure which of the two forces is stronger, with rewir- 
ing events mainly driven by kinase take-over leading to a rewir- 
ing angle <45° from the diagonal in Figure 5B, while rewiring 
events mainly driven by kinase resignation would be associated 
with rewiring angles >45°. As shown in Figure 5G, our results 
based on the median rewiring angle as well as the ratio of 
take-over/resignation events measured at different positions 
relative to the phosphorylation site show that, regardless of 
the position, upstream rewiring events are predominantly driven 
by kinase resignation forces. Illustrative examples of this can 
be found in Tables S2 and S3, where many of the most strongly 
rewiring events are caused by cancer mutations disrupting, for 
instance, proline residues in P+1 positions of CDK substrates. 
For example, a mutation juxtaposed to a phosphorylation site 
on position 721 of damage-specific DNA binding protein 1 
(DDB1 P721Q) is predicted to cause an upstream shift from 
CDK1 to ATM and a similar mutation on CCP110 (CCP110 
P171L) leads to a predicted upstream rewiring from CDK1 to 
PLK1 (Table S3). Finally, mutations on ORC1 (P312S), CDC23 
(P583T), and NUMA1 (P1 1 3H) illustrate how disruption of pleio- 



tropic recognition motifs, such as the one for CDK1 kinase, can 
lead to upstream rewiring events. 

Overall, these results suggest that cancer mutations may 
rewire upstream signaling typically by worsening an optimal sub- 
strate site for a given upstream kinase and not by generating 
a more optimal substrate better matching another upstream 
kinase. Considering the fact that it has taken millions of years 
to evolve exquisitely fine-tuned motifs around phosphorylation 
sites that would confer signaling specificity and fidelity (Tan 
et al., 2009; Zarrinpar et al., 2003), it is not so surprising that 
cancer mutations most often perturb this finely evolved system 
by generating weaker phosphorylation motifs. 

Constitutive Activation and Inactivation of Kinases 

As a final group of NAMs on protein kinases, we also analyzed 
the presence of mutations that would lead to the constitutive 
activation and inactivation of protein kinases (Figure 1). 

Starting from the former case, we used the so-called phos- 
pho-mimicking effect of acidic mutations in close proximity to 
(just before, P-1, or after, P+1) activating phosphorylation sites 
on the activation segments of kinase domains (Davies et al., 
2002) to identify in silico missense mutations that can result in 
a constitutively active kinase. 

Taking the well-characterized case of BRAF V600E, a phos- 
pho-mimicking activating mutation, as a positive test case, we 
confirmed that ReKINect could identify this mutation in one of 
the ovarian cell lines (ES2) and predict it as kinase activation. 
We subsequently experimentally confirmed the hyper-phos- 
phorylated state of the BRAF substrate, MEK by immuno-blot 
in the ES2 line (Figures 6A and 6B). 

In addition to this well-known case, ReKINect predicted 23 
other instances of potential constitutively activating kinase mu- 
tations (Table S4). Although some of these mutations fall nearby 
or on phosphorylation sites that have not yet been shown to 
regulate enzymatic activity, for a considerable fraction of them 
there is substantial evidence they could lead to kinase activation 
(Table S4). One exciting example of a predicted phospho- 
mimicking mutation was identified on the hematopoietic progen- 
itor kinase 1 (HPK1 ), namely the mutant variant HPK1 A164D. 
Alanine 164 is immediately adjacent to the activating phos- 
phorylation site T165 on the activation segment of HPK1, and 
mutation to Asp is predicted to confer constitutive activation of 
HPK1 and the likely engagement of its downstream JNK and 
NF-kB signaling (Arnold et al., 2005). Thus, the ReKINect predic- 
tions suggest a role for cellular stress response and potentially 
hematopoietic involvement in lung cancer, the cancer type in 
which this mutation was identified. 

To model kinase inactivating mutations, we hypothesized that 
mutations that alter catalytically essential residues (e.g., resi- 
dues mediating ATP binding, Mg 2+ coordination or phospho- 
transfer, as defined in Zeqiraj and van Aalten, 2010 and Table 
S5) could lead to kinase inactivation. The high number of 
instances identified by ReKINect and detailed in Table S6 
(427 unique kinase inactivation events) suggests that a large 
number of kinases become inactivated during cancer develop- 
ment. While it has previously been shown that inactivating 
mutations on kinases could lead to Peutz-Jeghers syndrome 
(Mehenni et al., 1998) or to pseudo-kinases throughout natural 
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Figure 6. Constitutive Activation and Inactivation of Kinases by NAMs 

(A) ReKINect identified ES2 cells as containing the constitutively activating BRAF V600E mutation. 

(B) An immunoblot and associated quantification, illustrating the phosphorylation of BRAF substrate MEK in the mutant cell line ES2 (in red) compared to the wild- 
type cell lines (in black), using total MEK and (3-tubulin for normalization. 

(C) ReKINect identified several cancer mutations in catalytically essential residues of kinase domains. 

(D) A quantification of all mutations from the global repository of cancer somatic mutations predicted to inactivate kinases and the catalytically essential positions 
they hit. Mutations leading to kinase domain catalytic inactivation are enriched {x 2 test, p = 1 .69 x 1 0 -16 ) in cancer somatic mutations (with particular preference 
for the aspartate, D, and glycine, G, in the DFG motif). 



evolution (Zeqiraj and van Aalten, 2010), our results indicate that 
kinase inactivation may hitherto have been largely under-appre- 
ciated in the interpretation of cancer genomes. 

A closer inspection of these predicted inactivating mutations 
reveals a bias toward specific critical residues. In particular the 
first and third residues of the DFG motif (i.e., the glutamate and 
glycine, respectively) that defines the start of the activation 
segment, harbors approximately one-third of all inactivating 



mutations (Figures 6C and 6D). While mutations in other essen- 
tial residues are likely to equally lead to kinase inactivation (see 
Table S5 for further information on the kinase catalytically 
essential residues included as part of ReKINect), our results 
suggest a significant preference for these two residues being 
mutated in the context of kinase inactivating mutations in 
cancer (x 2 test, p = 2.2 x 10 -16 ). Thus, these two positions of 
the DFG motif are predicted to constitute structural and 
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biochemical hotspots for NAMs leading to inactivation of protein 
kinases in cancer. 

Overall, these results suggest that ReKINect is capable of 
predicting NAMs that constitutively activate or inactivate protein 
kinases and that, in addition to BRAF V600E, numerous other 
similar mutations are likely to exist that directly affect the cata- 
lytic activity of kinases in cancer signaling. 

Functional Mutations in SH2 Domains and Global 
Phenotypic Impact of NAMs 

The SH2 domain is of seminal importance to signaling fidelity, 
cellular organization, and function across Metazoan species 
and is often part of protein kinases and perturbed in human dis- 
ease (Bibbins et al., 1 993; Manning et al., 2002; Marengere et al., 

1 994; Pawson et al., 2001). Thus, we reasoned that the inclusion 
of the SH2 domain as part of ReKINect would enable us to make 
integrated predictions of higher accuracy and relevance from a 
signaling perspective. SH2 domains possess an essential Arg 
residue found within a highly conserved sequence motif 
(FLVRES) that makes direct contact with phosphoTyr residues 
in its binding partners (Bibbins et al., 1993). By incorporating 
this critical residue for the phospho-tyrosine binding function 
of SH2 domains into ReKINect, we could predict 20 distinct 
instances (including mutations on ABL, SYK, and GRB1 0) where 
cancer mutations disrupt a critical functional residue, thus im- 
pairing the ability of the mutant SH2 domains to bind their sub- 
strates (Table S7). 

As with the mutations causing kinase downstream rewiring, 
by mapping cancer mutations onto the determinants of speci- 
ficity of the SH2 domain identified by our algorithm KINspect 
(Creixell et al., 2015), ReKINect predicted 93 NAMs causing 
SH2 downstream rewiring (Table S8) by changing positions 
within the domain that show a high likelihood of playing a critical 
role in substrate specificity (specificity score higher than 0.9). 
The comparably lower number of inactivating and downstream 
rewiring mutations in SH2 domains compared with kinase do- 
mains, is attributable at least in part to the smaller number and 
size of SH2 domains in comparison with kinase domains 
(Figure SI). 

Finally, to systematically explore the potential functional or 
phenotypic impact of the NAMs described above, we per- 
formed RNAi knockdown of kinase and SH2 domain containing 
proteins across the ovarian cancer cell lines. The effect of these 
perturbations on nuclear number was then determined using a 
regressor network model of protein-protein interactions and 
NAMs (Figure S3; Experimental Procedures). We found that if 
ReKINect classified NAMs were present in the network vicinity 
of the RNAi target gene a significant impact on the phenotypic 
response, either pro- or anti-proliferative, was observed (p = 
7.1 x 1CT 13 ). These results would suggest that network attack- 
ing mutations, predicted by ReKINect, are not only biochemi- 
cally functional but also lead to significant phenotypic changes 
in cancer cell models, on a global scale. 

DISCUSSION 

Given that protein kinases are one of the protein classes most 
frequently encoded by cancer genes (Futreal et al., 2004) and 



mutated in cancer (Figure SI ) as well as a major molecular target 
of therapeutic drugs (Anastassiadis et al., 2011; Davis et al., 

2011), it is essential to identify how phosphorylation-based 
signaling networks drive cancer. Thus, the number of distinct 
NAMs cataloged by ReKINect represents promising new leads 
for future studies. Serving as a systematic complement to previ- 
ous efforts identifying the function that individual cancer muta- 
tions may play (Davies et al., 2002; Friend et al., 1986; Stehelin 
et al., 1976), ReKINect is designed to predict the underlying 
signaling mechanisms and perturbations caused by mutations 
in cancer, or other complex diseases, using first principles 
governing protein function from evolution, protein chemistry, 
and protein structure and architecture. 

Evidence for NAMs and Signaling Trends in Cancer 

Through integration of low and high-throughput computational 
and experimental technologies, we have discovered the exis- 
tence of the NAMs described in Figure 1 . Having analyzed the 
data generated here and in global genome sequencing efforts, 
we conclude that there is ample evidence supporting the hypoth- 
esis that all the different types of NAMs described do indeed 
occur in cancer. 

In addition, our results also uncovered a variety of interesting 
signaling trends resulting from cancer mutations: first, our results 
demonstrate the existence of new molecular logic gates in can- 
cer. The genesis of new phosphorylation sites by mutations as 
uncovered here illustrate how cancer cells can acquire novel 
and prominent signaling flows and altered information process- 
ing that may result in new phenotypic states to be reached. 

We identified and experimentally confirmed three striking ex- 
amples of cancer mutations directly leading to a catalytic spec- 
ificity drift, PKD1 D665N, PKCy D484N, and M501 1. Downstream 
rewiring had until now been the most elusive type of NAMs, as 
reflected by the fact that only a single instance of this type of mu- 
tation, where a kinase is altered in specificity through mutation, 
RET M918T, had been reported in the literature (Borrello et al., 
1 995; Santoro et al., 1 995; Songyang et al., 1 995). The discovery 
of these three new NAMs, using a global yet selective and sensi- 
tive approach, would suggest that many more such events could 
exist in cancer. 

Supporting this, we could pinpoint 16 additional cancer 
mutations that, given that they harbor identical amino acid sub- 
stitutions to the ones we tested, are most likely to also encode 
downstream rewiring events. Next, when studying NAMs that 
would lead to downstream rewiring on a position that was 
recently confirmed to drive peptide specificity on the phospho- 
acceptor of phosphorylation sites (Chen et al., 2014), we could 
identify nine cancer mutations that are predicted to steer 
signaling toward phosphorylation of Thr, whereas no mutants 
were found in the opposite direction (Figure S4). Despite the 
fact that these numbers are not sufficiently high to enable robust 
statistics and that a large number of wild-type kinases originally 
encode Ser-directing residues in the DFG+1 position (thereby 
partially explaining the lack of Thr-to-Ser mutants), this bias sug- 
gests that specific cancers may harbor increased Thr-based 
signaling. Given that, due to its unique mutational and physico- 
chemical properties, serine has been identified as a mutational 
hub (Creixell et al., 2012b) and thereby a likely result of cancer 
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mutations, we speculate that through such Ser-to-Thr signaling 
rewiring cancer cells might evolve less dependency on serine 
signaling. 

Furthermore, if by affecting a large number of substrates, 
downstream rewiring NAMs are likely to have a broader impact 
at the network level, the fact that two of the downstream rewiring 
mutants described here lead to a less specific kinase and are 
thus likely to phosphorylate more substrates, highlights even 
further the potential impact of these NAMs. These results sig- 
nificantly extend our previous observations that less-specific 
kinases tend to be cancer mutation targets (Miller et al., 2008) 
and serve as a critical resource that we hope will start paving a 
new avenue on signaling specificity in cancer research. 

Similar to the case of kinase downstream rewiring, relatively 
few mutations on SH2 domains had been reported to obliterate 
tyrosine-binding or shift their specificity (Marengere et al., 
1994; Pawson et al., 2001), highlighting that this might also be 
a hitherto hidden and yet perhaps a fundamental signaling trend 
in cancer. 

Finally, an over-representation of kinase-resignation upstream 
rewiring events suggests that cancer mutations most often lead 
to upstream rewiring by worsening existing optimal substrates 
rather than generating super-optimal new substrates for other 
upstream kinases. Given the amount of fine-tuning achieved 
over millions of years of evolution at the substrate level (Tan 
et al., 2009; Zarrinpar et al., 2003), it is perhaps to be expected 
that mutations in substrates will most often lead to poorer phos- 
phorylation motifs. 

Finally, our results suggest both the existence of previously 
unknown constitutively activating mutations in kinases as well 
as the presence of mutational hotspots on two specific positions 
leading to the inactivation of protein kinases, namely the Asp and 
Gly within the DFG motif at the beginning of the activation 
segment. It could be the aim of future studies to elucidate why 
these spots are preferred by cancer mutations when inactivating 
kinases. 

Non-Recurrent yet Functional NAMs 

While a large fraction of recurrent and/or conserved mutations 
can directly or indirectly be considered NAMs as they typically 
perturb signaling networks (as exemplified here with BRAF 
V600E) and they typically operate as functional driver mutations, 
in this study we have demonstrated that non-recurrent and non- 
conserved mutations also can be functional NAMs (Figure 7B). 
This may be most evident from the observation that downstream 
rewiring mutations can lead to a switch of specificity of a com- 
parable magnitude to the specificity difference between two 
distinct kinases in the human kinome. Thus, despite the fact 
that previous studies of cancer mutations, including some on 
kinase domains, have disregarded non-recurrent variants as be- 
ing non-functional passenger mutations (Greenman et al., 2007), 
our results suggest that many of these do indeed have a func- 
tional role. Still, pinning down the actual contribution of these 
less frequent yet functional mutations, or combinations thereof, 
and under which context they drive oncogenesis will require a 
concerted research effort by both the genomics and signaling 
communities. If we move from a perception of oncogenes and 
tumor-suppressors operating in isolation to drive oncogenesis, 



toward a new paradigm, where numerous mutations play a 
driving role under specific cellular contexts (e.g., when appear- 
ing in combination with other mutations) (Creixell et al., 2012a; 
Wu et al., 2010), it will be important to acknowledge that it is 
likely that several of these functional NAMs drive cancer in a 
concerted fashion. 

As shown in Figure 7A, some of the NAMs we have identified 
here are likely to impose dramatic alterations in signaling 
networks, such as specificity switches that are analogous to 
introducing a new kinase and thus may play a driving role in 
oncogenesis. 

The fact that there are multiple strategies in which the same 
signaling output can be achieved by distinct cancer mutations 
(as shown for instance by inactivating mutations in Figure 6D) 
and that we have identified overexpression of an instance of 
one of these functional NAMs, further supports the importance 
of such less frequent functional mutations (Figure 7B). 

Perspectives 

Our results suggest that signaling networks are both dynami- 
cally and structurally rewired in cancer cells to an extent far 
beyond what was previously anticipated. Such rewiring in- 
cludes constitutive activation and inactivation of kinase and 
SH2 domains, upstream and downstream rewiring of phos- 
phorylation-based signaling, and the extinction and genesis of 
phosphorylation sites. These findings will be critical for network 
medicine efforts where drug targets for complex diseases are 
defined at the network level and for the individual patient or 
tumor. 

Here, we demonstrated six distinct NAMs as proof-of-princi- 
ple and verified all the NAMs described in Figure 1 A are present 
in cancer. Future expansions of the KINspect (Creixell et al., 
2015) and ReKINect algorithms to include other protein 
domains, PTMs, and linear motifs, more complex genetic pertur- 
bations (such as copy-number variations or genomic re-arrange- 
ments leading to protein fusions) and the advancement of 
sequencing and MS technologies, will likely facilitate the discov- 
ery of many additional NAMs. Such advancements to link cancer 
genomic and proteomic data will become valuable resources for 
dealing with the intrinsic complexities of tumors (Weinberg, 
2014; Yaffe, 2013). 

In the last century, Schechter et al. (1984) and Ullrich et al. 

(1984) connected the discovery of the oncogene Her-2/neu to 
its hyperactivity in a fraction of breast cancers (Slamon et al., 
1987) and the development of targeted therapies such as Tras- 
tuzumab (Carter et al., 1992). Others linked the discovery of 
the BCR-ABL fusion protein (Rowley, 1973) to CML leading to 
the development of Imatinib (Druker et al., 1 996) and newer gen- 
eration inhibitors. Similarly, we hope that ReKINect, and similar 
tools, can be utilized to close the cancer mutation interpretation 
gap. Boosting genomic interpretation capacity should ideally 
parallel the rate at which next generation technologies identify 
new mutations in order to help meet the bench-to-bedside 
challenge (Figure 7C), assist clinicians in making better 
treatment decisions for those patients carrying infrequent yet 
functional cancer mutations and facilitate the development of 
novel “magic bullets” (Strebhardt and Ullrich, 2008) and preci- 
sion medicines (Creixell et al., 2012a). 
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Figure 7. Evidence and Model for Functional Mutations and Tumor-Specific Network Medicine 

(A) The functional mutations found in this study are clear examples of single amino acid mutations that can severely perturb signaling networks. 

(B) Our study shows how non-recurrent cancer mutations on non-conserved residues can be functionally important and that functional recurrent (orange) and 
non-recurrent (red) NAMs can converge at the signaling network level. We also identified a case where a functional mutation in a low-abundant protein is 
accompanied by its overexpression. 

(C) The deployment of tools like ReKINect should enable the proposition of more refined signaling mechanisms underlying cellular cancer phenotypes and 
identification of driver and therapeutically relevant mutations. 



EXPERIMENTAL PROCEDURES 

Building Comprehensive Sets of Sequences: Kinome, SH2ome, 
and Phosphorylation Sites 

We built comprehensive sets of sequences covering all human kinase proteins 
(Manning et al., 2002), 120 SH2 domains (Liu et al., 2006), and a broad set of 
known human phosphorylation sites (Hornbeck et al., 2004). With these sets, 
we performed domain-centered sequence alignments using ClustalW and 
Omega (Sievers et al., 2011) followed by subsequent manual refinement. 
These alignments were then deployed by identifying functional residues on 
them and mapping these residues back to the wild-type version of the mutant 



sequences analyzed with ReKINect. Similarly, phosphorylation site peptides 
were matched to the wild-type variants of all mutations, so that the distance 
between each mutation and its closest phosphorylation sites could be 
determined. 

Collecting a Global Repository of Somatic Cancer Mutations 

We compiled a global set of publicly available somatic cancer mutations from 
COSMIC v6 7 (Forbes et al., 2011) and generated the FASTA files required by 
ReKINect containing both the wild-type and mutant versions of all coding 
missense variants, using purpose-made Python scripts and ENSEMBL’s 
VEP resource (Flicek et al., 2014). 
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Computing Minimum Distance to Substrate from PDB Files 

Minimum distances to substrates were computed as described in the accom- 
panying article (Creixell et al., 2015) and further detailed in the Supplemental 
Experimental Procedures. 

Protein Kinase Specificity Assays 

Kinases and mutants were expressed by transient transfection of encoding 
plasmids in HEK293T cells, purified by FLAG affinity purification, and PSPL 
experiments were performed as described (Mok et al., 2010). Further details 
can be found in the Supplemental Experimental Procedures. 

Further details about the maintenance of cell lines, preparation of sequencing, 
mass spectrometry, and RNAi screening samples and their computational 
analysis can similarly be found in the Supplemental Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
five figures, eight tables, and six data files and can be found with this article 
online at http://dx.doi.Org/10.1016/j.cell.2015.08.056. 
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Highlights 

• A reporter for endogenous genomic DNA methylation (RGM) 
is established 

• RGM can capture endogenous methylation state of 
promoters and non-coding regions 

• RGM allows tracing of methylation changes both in vitro and 
in vivo 

• RGM allows monitoring dynamics at single-cell resolution 
during cell-fate changes 
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SUMMARY 

Mammalian DNA methylation plays an essential role 
in development. To date, only snapshots of different 
mouse and human cell types have been generated, 
providing a static view on DNA methylation. To enable 
monitoring of methylation status as it changes over 
time, we establish a reporter of genomic methylation 
(RGM) that relies on a minimal imprinted gene pro- 
moter driving a fluorescent protein. We show that 
insertion of RGM proximal to promoter-associated 
CpG islands reports the gain or loss of DNA methyl- 
ation. We further utilized RGM to report endogenous 
methylation dynamics of non-coding regulatory ele- 
ments, such as the pluripotency-specific super en- 
hancers of Sox2 and miR290. Loci-specific DNA 
methylation changes and their correlation with tran- 
scription were visualized during cell-state transition 
following differentiation of mouse embryonic stem 
cells and during reprogramming of somatic cells to 
pluripotency. RGM will allow the investigation of dy- 
namic methylation changes during development 
and disease at single-cell resolution. 

INTRODUCTION 

DNA methylation is recognized as a principal contributor to the 
stability and regulation of gene expression in development and 
maintenance of cellular identity (Bird, 2002; Cedar and Bergman, 
2012; Jaenisch and Bird, 2003; Reik et al., 2001). Changes in 
DNA methylation are dynamic and it is still largely unknown 
how they dictate spatial and temporal gene expression pro- 
grams (Smith and Meissner, 2013). Recent advancements in 
sequencing technologies enabled the establishment of methyl- 
ation maps for multiple cell types in both human (Kundaje 
et al., 2015; Schultz et al., 2015; Smith et al., 2014; Ziller et al., 
2013) and mouse (Hon et al., 2013), thus providing a framework 
for identifying key lineage-specific regulators (Rivera and Ren, 
2013). DNA methylation is a dynamic process and current 
methods are only bulk and provide a static “snapshot” view of 
the methylation state during cell-state transitions. The difficulty 
in translating real-time epigenetic changes into a traceable 
readout is, to date, a limiting factor in our ability to follow the dy- 
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namics of DNA methylation. Therefore, a key challenge in the 
field is to generate tools that allow tracing changes in DNA 
methylation over time. 

Here, we set out to generate a DNA methylation reporter sys- 
tem that is capable of visualizing genomic methylation states at 
single-cell resolution. The design of the reporter was based on 
two premises: (1) previous observations suggesting that CpG 
sites can serve as c/s- acting signals, affecting the methylation 
state of adjacent CpGs (Brandeis et al., 1994; Mummaneni 
et al., 1995; Turker, 2002), and (2) a methylation-sensitive pro- 
moter that, when introduced in proximity to a CpG region of 
choice, may be utilized to report on methylation changes of 
the adjacent sequences. Thus, a key issue in establishing a 
DNA methylation reporter was identifying a methylation-sensi- 
tive promoter, which is not independently regulated by the 
DNA methylation machinery, but can be affected by methyl- 
ation changes of adjacent sequences. Constitutively active 
genes usually contain hypomethylated high density CpG 
islands (CGIs) in their promoter regions and are not regulated 
by DNA methylation (Deaton and Bird, 2011) whereas gene pro- 
moters associated with low-density CGIs are activated and re- 
pressed in a tissue-specific manner. Because methylation of 
both classes of promoters is either not regulated by the DNA 
methylation machinery in all tissues or regulated in a tissue- 
dependent manner, these promoters cannot be utilized as 
DNA methylation reporters. In contrast, imprinted gene pro- 
moters exhibit inherent sensitivity to DNA methylation of adja- 
cent genomic regions resulting in transcriptional activation or 
silencing. This mechanism has been established for a subgroup 
of germline-derived differentially methylated regions (DMRs) 
that affect in c/s the methylation state of secondary regulatory 
promoter elements, which in turn control imprinted gene activ- 
ity. Importantly, following their establishment, promoter-associ- 
ated imprinted DMRs are not regulated by the DNA methylation 
machinery in a tissue-specific manner (Ferguson-Smith, 2011). 
We hypothesized that these intrinsic characteristics of im- 
printed gene promoters make them attractive candidates for 
methylation sensors. Perhaps one of the best-studied exam- 
ples is the Prader-Willi Angelman region, in which an imprinted 
DMR resides at the small nuclear ribonucleoprotein polypeptide 
N (Snrpn) gene promoter region controlling its parent-of-origin 
monoallelic expression (Buiting et al., 1995; Kantor et al., 
2004). Furthermore, Snrpn is expressed in most of the tissues 
and thus serves as an attractive candidate to generate a DNA 
methylation reporter. 
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Changes in DNA methylation occur mostly at non-CGIs, some of 
which are associated with tissue-specific gene promoters (Jones, 
201 2). Nevertheless, a growing body of evidence suggests that the 
bulk of tissue-specific changes in DNA methylation is associated 
with non-coding sequences (Irizarry et al., 2009) such as distal 
regulatory elements, which include enhancers and transcription 
factor binding sites (Hon et al., 2013; Stadler et al., 2011; Ziller 
et al., 2013). Recent reports identified super-enhancers (SE) as 
clusters of TF and mediator-binding sites associated with bona 
fide enhancer chromatin marks to control the expression of key 
cell identity genes (Dowen et al., 2014; Hnisz et al., 2013; Whyte 
et al., 2013). Global genomic comparisons of tissue-specific 
DNA methylation and transcription factor (TF) chromatin immuno- 
precipitation sequencing (ChIP-seq) data correlated the chro- 
matin with the methylation state (Xie et al 2013). Thus, many 
tissue-specific enhancers are hypomethylated in tissues where 
the target genes are expressed, but are hypermethylated in tis- 
sues where the target genes are silent (Hon et al., 2013). 

In this paper, we establish a reporter of genomic methylation 
(RGM) that enables the visualization of changes in DNA methyl- 
ation in live cells. We show that a minimal Snprn promoter can 
report on the DNA methylation state of endogenous gene pro- 
moters. We also generated reporter cell lines for the pluripo- 
tency-specific miR290 and Sox2 SEs and demonstrate that 
RGM can be used to capture dynamic DNA methylation changes 
in distal non-coding regulatory regions. An attractive aspect of 
RGM is its utility to visualize DNA methylation changes in devel- 
opment and disease at single-cell resolution in the same sample. 

RESULTS 

A Methylation-Sensitive Reporter System Based on a 
Minimal Imprinted Promoter 

To establish a methylation reporter, we generated a minimal 
Snrpn promoter that includes the conserved elements between 
human and mouse and contains the endogenous imprinted 
DMR region (Figure SI A). The minimal promoter region driving 
GFP was cloned into a sleeping beauty transposon vector (Ivies 
et al., 1997) to facilitate stable integration into the genome. 
Recent studies have demonstrated that different CGI vectors, 
when stably inserted into mouse embryonic stem cells (mESCs), 
adopt a methylation pattern that corresponds to the in vivo 
methylation pattern of the respective endogenous sequence 
(Sabag et al., 2014). To test whether DNA methylation can prop- 
agate into the Snrpn promoter region in vivo, we designed an 
experimental system in which the CGI regions of Gapdh and 
Dazl were cloned upstream of our reporter (Figure 1 A). The pro- 
moter of Gapdh encompasses a hypomethylated CGI consistent 
with constitutive expression in all tissues. In contrast, the Dazl 
promoter-associated CGI is hypermethylated in all tissues 
excluding the germ cells (Hackett et al., 201 3). Given the different 
expression and methylation patterns of both genes, upon stable 
integration of the two reporter vectors into mESCs the Gapdh 
CGI is expected to maintain its hypomethylated state, while 
the Dazl CGI would be subjected to de novo methylation (Sabag 
et al., 2014). Figure IB show that >95% of cells carrying the 
Gapdh reporter expressed GFP. In contrast, >30% of cells 
carrying the Dazl reporter were GFP-negative, corresponding 



to reporter silencing. The effect of the Dazl reporter becomes 
more robust upon continued passage, with >80% of the cells 
silencing their reporter within 4 weeks (Figure 1 B). 

To assess the DNA methylation levels of the Gapdh and Dazl 
reporters following introduction into mESCs, we sorted Gapdh 
GFP-positive and Dazl GFP-negative cell populations (Figure 1 C). 
The GFP expression state was stable upon continuous culture 
and passaging of the two sorted cell populations for over 7 weeks 
(Figure 1C). DNA was extracted from both Gapdh GFP-positive 
and Dazl GFP-negative cells and subjected to bisulfite conversion 
and PCR sequencing. Figure 1 D shows that Gapdh GFP-positive 
cells maintained the hypomethylated state at both Gapdh CGI 
and the Snrpn promoter regions, whereas Dazl GFP-negative 
cells became highly de novo methylated at the Dazl CGI region 
and its corresponding downstream Snrpn promoter (Figure IE). 
These results are consistent with the hypothesis that DNA methyl- 
ation can be propagated from the CGI into the Snrpn promoter re- 
gion resulting in repression of transcriptional activity. 

RGM Is a Reporter for In Vivo Demethylation 

The experiments described above showed that RGM reports on 
de novo methylation imposed in vivo on the unmethylated Dazl 
CGI donor test sequence. Conversely, we were interested to 
assess whether a methylated and silent donor Snrpn promoter 
can be reactivated by means of demethylation acquired in vivo. 
For this, we used the CpG methyltransferase M.SssI to in vitro 
methylate both Gapdh and Dazl reporter constructs. Treatment 
of the plasmids with M.SssI enzyme followed by bisulfite conver- 
sion, PCR amplification, and sequencing, confirmed the com- 
plete hypermethylation of both the CGI and Snrpn promoter 
regions (Figures 2A, SIB, and SIC). ESCs were transfected 
with either Gapdh or Dazl reporter and selected for cells carrying 
stably integrated vectors. Following 1 week of culture, we iden- 
tified robust activation of GFP in virtually all cells carrying the 
integrated Gapdh reporter, whereas cells carrying the Dazl re- 
porter remained GFP-negative (Figures 2B-2D). To assess the 
DNA methylation state of the Gapdh and Dazl CGI and the 
respective downstream Snrpn promoter regions, DNA was ex- 
tracted from the two cell lines, subjected to bisulfite conversion, 
PCR amplification and sequencing. Figure 2E demonstrates 
that, consistent with high GFP expression, the Gapdh CGI and 
its downstream Snrpn promoter had become fully demethylated. 
In contrast, the Dazl CGI and its downstream Snrpn promoter se- 
quences maintained the hypermethylated state in agreement 
with complete repression of the GFP signal (Figure 2F). Thus, 
our data support the hypothesis that a Snrpn promoter can 
report on in vivo demethylation of the CGI in its proximity. 

Dnmtl, Dnmt3a, and Dnmt3b Mediate Methylation and 
Reporter Activity 

We used ESCs deficient for the DNA methyltransferases Dnmtl, 
Dnmt3a, and Dnmt3b to gain mechanistic insights into demethy- 
lation and de novo methylation imposed on the Snrpn promoter 
in transfected ESCs. Figure 2G shows that introduction of an 
in vitro methylated Dazl Snrpn vector into Dnmtl mutant cells re- 
sulted in -80% GFP-positive cells by passage five, in contrast to 
no GFP-positive cells when inserted into wild-type (WT) cells. In 
agreement with the role of Dnmtl as being the maintenance DNA 
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Figure 1. An Active Minimal Snrpn Promoter Can Be Repressed in cis by Means of Spreading of DNA Methylation into the Promoter Region 

(A) Schematic representation of the sleeping-beauty-based vectors. Endogenous CpG Islands (CGI) of Dazl and Gapdh genes were cloned upstream of a minimal 
Snrpn promoter region-driving GFP. Open circle lollipops schematically represent individual unmethylated CpG. 

(B) Flow cytometric analysis of V6.5 mESCs cultured in serum + LIF, following stable integration of unmethylated Gapdh and Dazl reporter vectors, demonstrating 
robust repression of GFP signal in the Dazl reporter cells over time. Shown are the mean percentages of GFP-negative cells ± STD of two biological replicates. 

(C) Phase and fluorescence images of the sorted V6.5 mESCs, comprising stable integration of the Gapdh (left) and Dazl (right) vectors following prolonged 
culturing for 7 weeks. 

(D and E) Bisulfite sequencing analysis of the stably transfected Gapdh (D) and Dazl (E) reporter cell lines was performed on the gene promoter-associated CGI 
(left) and the downstream Snrpn promoter region (right). Open circles represent unmethylated CpGs; Filled circles, methylated CpGs. 

See also Figure SI . 



methyltransferase (Li et al., 1992), bisulfite sequencing analysis 
on the sorted GFP-positive cells confirmed that reactivation of 
the methylated Dazl reporter occurred by passive demethylation 
(Figure 2H). To clarify the mechanism of de novo methylation, we 
introduced an unmethylated version of both vectors into mESCs 
deficient for both de novo DNA methyltransferases Dnmt3a and 
Dnmt3b (Pawlak and Jaenisch, 2011). Figure 21 shows that the 
vast majority of cells carrying the Dazl or the Gapdh reporters 
were positive for GFP unlike Dazl reporter expression in control 
V6.5 cells (Figure 21), which is consistent with Dnmt3a/b medi- 
ating de novo methylation and reporter silencing. 

Recent studies have shown that culturing mESCs in 2i medium 
(inhibitors of MEK and GSK3), and leukemia inhibitory factor (LIF) 



results in downregulation of Dnmt3a and Dnmt3b, consequently 
leading to global hypomethylation (Lee et al., 2014). To assess 
whether these culture conditions affect reporter activity, we trans- 
fected the un methylated Gapdh and Dazl reporters into WT 
mESCs cultured in 2i and LIF. Figure 21 shows that the great ma- 
jority of the stably transfected cells were GFP-positive, consistent 
with 2i-mediated downregulation of the Dnmt3a and Dnmt3b. 

RGM Can Report on Methylation Associated with 
Endogenous Gene Promoters 

To test whether the Snrpn promoter could also report on DNA 
methylation levels associated with endogenous gene promoters, 
we utilized CRISP/Cas-mediated gene editing to target the 
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Figure 2. An In Vitro Repressed Snrpn Promoter Can Be Reactivated in cis by Means of Spreading of DNA Demethylation into the Promoter 
Region 

(A) Schematic representation of in vitro methylated sleeping-beauty-based vectors. Closed circle lollipops schematically represent individual methylated CpG. 

(B) Phase and fluorescence images of the stably integrated V6.5 mESCs harboring Gapdh (left) and Dazl (right) in vitro methylated vectors, following 1 week of 
antibiotics selection. 

(C and D) Flow cytometric analysis of the proportion of GFP-positive cells in V6.5 mESCs, stably integrated with either Gapdh (C) or Dazl (D) in vitro methylated 
vectors, following 2 weeks in culture. 

(E and F) Bisulfite sequencing analysis of the stably transfected Gapdh (E) and Dazl (F) reporter cell lines, was performed on the gene promoter-associated CGI 
(left) and the downstream Snrpn promoter region (right). 

(G) Flow cytometric analysis of the proportion of GFP-positive cells in V6.5 mESCs and Dnmtl KO mESCs, stably integrated with in vitro methylated Dazl reporter 
vector. 

(H) Bisulfite sequencing analysis of sorted GFP-positive Dnmtl KO mESCs, stably integrated with in vitro methylated Dazl reporter vector. 

(I) Flow cytometric analysis of the proportion of GFP-negative cells in control V6.5 mESCs, mESCs deficient for both Dnmt3a and Dnmt3b ( Dnmt3ab KO) and V6.5 
mESCs cultured in 2i + LIF, which were stably integrated with unmethylated Gapdh (top) and Dazl (bottom) reporter vectors. 

See also Figure SI . 



endogenous CGIs located at the promoter regions of Gapdh 
and Dazl (Figures 3A, S2A, and S2B). Figure 3B shows 35/36 
Daz/-ve ctor-transfected clones were GFP-negative indicating 
robust silencing of the Dazl reporter whereas 20/21 Gapdh-vec- 
tor-transfected clones were GFP-positive (Figure 3B). FACS 
analysis of correctly targeted clones confirmed that Gapdh re- 
porter cells were all GFP-positive with the CGI and Snrpn pro- 
moter unmethylated (Figures 3C and 3D) in contrast to Dazl 



GFP-negative clones with the corresponding sequences methyl- 
ated (Figures 3E and 3F). Our results demonstrate that Snrpn re- 
porter activity reports on the methylation state of its surrounding 
sequences and does not alter their methylation state. Further- 
more, the endogenous targeting results suggested that the par- 
tial repression of the Dazl reporter (Figure 1 B), observed at early 
passages of the transgene experiment, may be due to multiple 
genome integration and position effects. 
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Figure 3. Generation of DNA Methylation 
Reporter Cell Lines for Endogenous Gene 
Promoters 

(A) CRISPR/Cas-based strategy used to inte- 
grate the DNA methylation reporter into the 
endogenous promoter region of Gapdh and 
Dazl genes. TSS, transcription start site; green 
sequence, endogenous CGI region; black seq- 
uence, targeting CRISPR; red sequence, PAM 
recognition site. 

(B) Flow cytometric analysis depicting the mean 
GFP intensity of randomly picked clones following 
antibiotic selection of both (top) Gapdh- and 
(bottom) Dazl- reporter-transfected V6.5 mESCs. 

(C) Flow cytometric analysis of the proportion of 
GFP-positive cells in two representative clones 
correctly targeted with the methylation reporter at 
the promoter region of Gapdh. 

(D) Bisulfite sequencing analysis was performed 
on mESCs harboring the DNA methylation re- 
porter in Gapdh promoter region. For each cell 
line, the PCR amplicon (marked with dashed line) 
includes both the endogenous CGI (left) and the 
downstream integrated Snrpn promoter region 
(right). 

(E) Flow cytometric analysis of the proportion of 
GFP-positive cells in two representative clones 
correctly targeted with the methylation reporter at 
the promoter region of Dazl. 



(F) Bisulfite sequencing analysis was performed on mESCs harboring the DNA methylation reporter in Dazl promoter region. For each cell line, the PCR amplicon 
(marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). 

See also Figure S2. 



RGM Can Report on Methylation of 
Pluripotency-Specific Super-Enhancers 

Methylation of super enhancers (SEs) has been shown to change 
during differentiation. We tested whether RGM would report 
on the active and hypomethylated state of the pluripotency- 
specific SEs associated with the miR290 and Sox2 genes in 
mESCs and their methylated and inactive state in somatic cells 
(Figures 4A and S3A). In contrast to the CGIs located at gene pro- 
moters ( Gapdh and Dazl), the SE regions of both Sox2 and miR290 
represent low-density CpG sequences. Utilizing CRISP/Cas- 
mediated gene editing, we inserted a Snrpn tdTomato reporter 
into the endogenous miR290 and Sox2 enhancer (Figures 4B 
and S3B, respectively). As recipient cells, we used the previously 
established Oct4, Sox2, Klf4, and c-Myc (OSKM) polycistronic 
dox-inducible secondary reprogrammable mESCs (Carey et al., 
201 1), which also carried a GFP reporter knocked into the endog- 
enous Nanog locus. Correct integration of the vector was vali- 
dated by PCR and Southern analysis (Figure S3C). Figure 4C 
shows that both targeted ESC lines (miR290 #21 and Sox2 #2) ex- 
pressed tdTomato as well as A/anog-GFP. To assess whether the 
tdTomato expression correlated with hypomethylation of the in- 
serted RGM, DNA extracted from the bulk mESCs population 
was bisulfite converted, amplified by PCR, and sequenced with 
the PCR amplification including both the SE CpG region and 
the downstream Snrpn promoter. As predicted from the methyl- 
ation maps (Figures 4A and S3A), both endogenous miR290 
and Sox2 CpG regions were mostly hypomethylated (Figure 4D). 
Importantly, the Snrpn promoter was also hypomethylated 
consistent with reporter expression. Of note, a few highly methyl- 



ated alleles were detected (Figure 4D), possibly reflecting an 
inherent variation in the bulk population due to the presence of 
cells that carry an inactive reporter. To test this possibility, we 
analyzed the Sox2 SE region in the untargeted parental cell, which 
identified the presence of both methylated and unmethylated al- 
leles at the same frequency as the targeted reporter cell line (Fig- 
ure S3D). We conclude that RGM can report on the methylation 
state of distal genomic regulatory regions. 

Dynamic De Novo DNA Methylation during 
Differentiation 

To monitor real-time changes in genomic DNA methylation 
during in vitro differentiation, mESCs carrying the tdTomato re- 
porters reflecting DNA methylation levels at the SE regions, 
were exposed to retinoic acid (RA), which induces a rapid 
exit from pluripotency, and cellular differentiation (Rhinn and 
Dolle, 2012). The presence of the Nanog-GFP reporter allowed 
monitoring exit from pluripotency by loss of GFP expression. 
Sorted double-positive (tdTomato + /GFP + ) miR290 and Sox2 
cells were plated on feeder-free gelatin coated plates, treated 
with 0.25 jiM RA the following day (Figure 5A) and analyzed at 
different times after addition of RA (Figures 5A and 5B). As ex- 
pected, undifferentiated cells were double-positive (tdTo- 
mato + /GFP + ). However, upon induction of differentiation a 
gradual reduction in the fraction of double-positive cells was 
observed with most disappearing over the time course of 
7 days, resulting in a largely double-negative cell population (Fig- 
ures 5B and 5C). This is in contrast to control Gapdh reporter 
cells that, as expected, appeared completely GFP-positive 
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Figure 4. Generation of DNA Methylation Reporter Cell Lines for the Pluripotent-Specific miR290 and Sox2 SE Regions 

(A) Regional view depicting the DNA methylation (top) and chromatin (bottom) landscape of miR290 upstream pluripotent-specific SE. Shown are average 
methylation levels and enrichment of chromatin marks in mouse undifferentiated cells (green) and in adult tissues (gold), with respect to the genomic organization 
of the genes. DNA methylation varies from 1 -hypermethylated to O-hypomethylated. Characteristic clusters of typical enhancer marks and binding of tissue- 
specific TF determine the SE region (light blue). 

(B) CRISPR/Cas-based strategy used to integrate the DNA methylation reporter into the endogenous SE region. HR, homologous recombination; green 
sequence, endogenous miR290 CpG region; black sequence, targeting CRISPR; red sequence, PAM recognition site. 

(C) Phase and fluorescence images of correctly integrated DNA methylation reporter cell lines for miR290 (upper panel) and Sox2 (lower panel) endogenous SE 
regions. GFP marks endogenous expression levels of Nanog, whereas tdTomato reflects the endogenous DNA methylation levels at both miR290 and Sox2 SE 
regions. 

(D) Bisulfite sequencing analysis was performed on undifferentiated mESCs harboring the DNA methylation reporter in either miR290 SE region (top) or Sox2 SE 
region (bottom). For each cell line, the PCR amplicon (marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn 
promoter region (right). 

See also Figure S3. 



following 7 days of RA differentiation (Figure S4A). tdTomato and 
A/a/rog-GFP-positive cells disappeared with different kinetics: 
while singly tdTomato-positive cells (tdTomatoVGFP - ) ap- 
peared after 2 days, only a few single A/anog-GFP-positive cells 
(tdTomato“/GFP + ) were detected during differentiation (Figures 
5B and 5C) suggesting that Nanog was silenced prior to methyl- 
ation and silencing of the miR290 and Sox2 SEs. 

To confirm that loss of the tdTomato signal correlated with 
accumulation of de novo methylation in both SE regions, we 



sorted the main populations at different time points during RA 
differentiation (Figure 5C). DNA was extracted from the different 
cell populations and subjected to bisulfite sequencing, thus al- 
lowing a comprehensive analysis of the methylation state in 
both the endogenous miR290 and Sox2 SE and their respective 
Snrpn promoter regions (Figures 5D, 5E, S4B, and S4C). In 
contrast to the bulk population of mESCs (Figure 4D), the sorted 
double-positive cells did not harbor completely methylated al- 
leles, consistent with the notion that methylated alleles in the 
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Figure 5. Dynamics of De Novo DNA Methylation of miR290 and Sox2 SE Regions upon In Vitro Differentiation 

(A) Schematic representation of the RA-based differentiation protocol used on miR290 and Sox2 reporter cell lines. GFP marks endogenous expression levels of 
Nanog, whereas tdTomato reflects the endogenous DNA methylation levels at both miR290 and Sox2 SE regions. 

(B) Flow cytometric analysis of the proportion of A/anog-GFP-positive cells (x axis) and tdTomato-positive cells (y axis) during 7 days of differentiation of miR290 
#21 (top) and Sox2 #2 (bottom) reporter cell lines. 

(C) Bar graph summarizing the proportion of the different cell populations during the course of 7 days RA differentiation for both miR290 #21 (top) and Sox2 #2 
(bottom) reporter cell lines. Data represent two biological replicates. R, tdTomato; G, GFP. 

(D and E) Bisulfite sequencing analysis on the three main cell populations sorted at 48 hr following initial treatment with RA. For both miR290 #21 (D) and Sox2 #2 
(E) cell lines, the PCR amplicon (marked with dashed line) includes the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). R, 
tdTomato; G, GFP. 

See also Figure S4. 



bulk population represent intrinsic variation. The methylation of 
both miR290 and Sox2 in single-positive cells (tdTomato + / 
GFP - ) was low, consistent with tdTomato expression. The over- 
all increased de novo methylation in the single-positive cells, 
compared with the double-positive cells, may suggest that 
DNA methylation-mediated silencing was already initiated in 
this intermediate cell population. Notably, our analysis identified 
completely methylated genomes in the Sox2 single-positive 
(tdTomato + /GFP - ) cell population (Figure 5E). This suggests 
that during rapid changes of de novo methylation, the half-life 
of the fluorescent protein (FP) may lead to an over-estimation 
of cells that are still hypomethylated during cell-state transitions. 
Finally, in agreement with the silencing of tdTomato expression, 
the double-negative cells (tdTomato - /GFP - ) exhibited robust 
hypermethylation on both endogenous SE regions and their 



respective Snrpn promoters (Figures 5D, 5E, S4B, and S4C). 
To test whether the targeted reporter allele correlated with the 
methylation levels of the untargeted allele (WT), we analyzed 
the WT allele in Sox2 reporter cells at different time points during 
differentiation. Figure S4D shows that similar to the reporter 
allele, the WT allele exhibited low levels of methylation in the 
sorted double-positive cells and high levels of methylation 
following 7 days of differentiation. We conclude that RGM allows 
dynamic monitoring de novo methylation events that are im- 
posed on genomic sequences upon exiting from pluripotency. 
Our data suggest that the differentiation of ESCs induces 
silencing of Nanog prior to de novo methylation of the two 
miR290 and Sox2 SEs. 

To test whether in vivo differentiation resulted in silencing of 
the tdTomato reporter in both miR290 and Sox2 SE regions, 
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Figure 6. Dynamics of DNA Demethylation of miR290 and Sox2 SE Regions during Cellular Reprogramming 

(A) miR290 (top) and Sox2 (bottom) reporter chimeric experimental embryos (right embryo in each panel). As controls, Gapdh CGI reporter mESCs driving GFP 
and constitutively expressing tdTomato (Control, Gapdh-GFP, and tdTomato, respectively) were injected into host blastocysts (left embryo in each panel). 

(B) Schematic representation of the experimental procedure to monitor the dynamics of demethylation during reprogramming of miR290 and Sox2 reporter cell 
lines. GFP marks endogenous expression levels of Nanog, whereas tdTomato reflects the endogenous DNA methylation levels at both miR290 and Sox2 SE 
regions. 

(legend continued on next page) 
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we analyzed 13.5 dpi chimeric embryos. As control, we injected 
ESCs harboring the Gapdh CGI reporter driving a GFP seq- 
uence, which had also been infected with lentiviruses resulting 
in constitutive expression of tdTomato. The robust expression 
of GFP in the Gapdh control embryos demonstrated the wide- 
spread expression signature of the Snrpn promoter throughout 
mouse tissues (Figure 6A). Unlike the Gapdh control, both 
miR290 and Sox2 embryos were completely negative for 
both GFP and tdTomato, demonstrating robust repression of 
Nanog and the Snrpn promoter during in vivo differentiation 
(Figure 6A). 

DNA Demethylation during Cellular Reprogramming 

Reprogramming of somatic cells to iPS cells involves demethy- 
lation and activation of the pluripotency SEs Sox2 and miR290 
(see Figures 4A and S3A). We investigated whether RGM could 
be used to capture demethylation events that are gradually 
acquired during cellular reprogramming. For this, we used sec- 
ondary Dox-inducible reprogrammable mouse embryonic fibro- 
blasts (MEFs) isolated from 13.5 dpi chimeric embryos that had 
been injected at the blastocyst stage with the OSKM DOX-induc- 
ible ESCs (Carey et al., 201 1) carrying Nanog-GFP and the tdTo- 
mato reporter reflecting DNA methylation levels at the Sox2 or 
miR290 SE alleles (see Figure 6B). Culture of these MEFs in 
DOX induces the reprogramming factors while Nanog-GFP acti- 
vation allows monitoring the course of reprogramming in the bulk 
somatic cell population (Buganim et al., 2012). As expected, 
MEFs isolated from 13.5 dpi embryos were negative for both 
GFP and tdTomato expression, as measured by fluorescent mi- 
croscopy and fluorescence-activated cell sorting (FACS) anal- 
ysis (Figures 6C and S5A). Importantly, consistent with tdTomato 
repression, both endogenous miR290 and Sox2 SE regions as 
well as their corresponding downstream Snrpn promoter regions 
were hypermethylated (Figure 6D). Further analysis of the WT 
allele in Sox2 MEF showed high correlation with the targeted re- 
porter allele, demonstrating robust repression of the SE region 
in vivo (Figure S5B). 

To test whether reprogramming-induced demethylation can 
be visualized by RGM, we treated the secondary MEFs with 
serum and LIF medium supplemented with 2 fig/ml doxycycline 
(Dox). Both miR290 and Sox2 MEFs were successfully reprog- 
rammed, resulting in double-positive cells (tdTomato + /GFP + , 
data not shown). It was recently shown that a combination of 
three chemicals, TGF-p antagonist ALK5 inhibitor II, GSK3b 
antagonist CHIR99021 , and ascorbic acid, an enzymatic cofactor 
(from here on referred to as 3C), results in more efficient and syn- 
chronous reprogramming (Vidal et al., 2014). To achieve more 



synchronized and efficient reprogramming, both miR290 and 
Sox2 MEFs were subjected to 3C culture conditions and the dy- 
namics of reporter activation was monitored by flow cytometry. 
While the first expression of tdTomato + and GFP + cells emerged 
at day 1 6 (Figure 6E), reporter activation of both miR290 and Sox2 
occurred with different kinetics. Figure 6E shows accumulation of 
miR290 reporter cells that activated both GFP and tdTomato 
(tdTomato + /GFP + ) over time. A small population of single-posi- 
tive GFP cells appeared in late stages of reprogramming consis- 
tent with a stochastic sequence of events in the reprogramming 
of the miR290 SE region. Unlike miR290 reporter cells, however, 
Sox2 cells showed a more robust and defined dynamic of 
activation of both reporters. By day 16, a population of single- 
positive GFP cells (tdTomato“/GFP + ) had accumulated, which 
gradually shifted to become double-positive (tdTomato + /GFP + ) 
over time (Figures 6E and S5C). To test whether the single-posi- 
tive GFP cells give rise to double-positive cells, we sorted the 
single-positive GFP cells and replated them on feeders using 
Dox independent culture conditions. Consistent with the repres- 
sion of the tdTomato signal, bisulfite sequencing confirmed that 
the single-positive GFP cells exhibit high levels of methylation 
in the SE region, as well as in the downstream Snrpn promoter re- 
gion (Figure S5D). Upon further culture, tdTomato-positive cells 
appeared demonstrating that single-positive GFP cells give rise 
to double-positive cells (Figure S5E). 

Our results suggest that reprogramming of both miR290 and 
Sox2 SE regions are late events, with the Sox2 SE region being 
reprogrammed subsequently to the activation of endogenous 
Nanog. miR290 and Sox2 double-positive (tdTomato + /GFP + ) 
cells invariably proceed to a Dox-independent iPS cell state (Fig- 
ure 6F). To assess the methylation state of the Sox2 and miR290 
SEs, we performed bisulfite sequencing on DNA extracted from 
sorted double-positive (tdTomato + /GFP + ) iPS cells. As shown in 
Figure 6G, both miR290 and Sox2 SE regions and their corre- 
sponding downstream Snrpn promoters were demethylated. 
These results confirmed that RGM can visualize demethylation 
of regulatory genomic regions during reprogramming with sin- 
gle-cell resolution. 

DISCUSSION 

In this work, we have generated a DNA methylation reporter 
(RGM) that allows imaging of DNA methylation with single-cell 
resolution. The design of the reporter system took advantage 
of the intrinsic characteristics of imprinted gene promoters, for 
which the transcriptional activity reflects the DNA methylation 
state of adjacent sequences. Importantly, imprinted promoters 



(C) Flow cytometric analysis of the proportion of GFP-positive cells (x axis) and tdTomato-positive cells (y axis) in P0 MEFs derived from miR290 #21 (left) and 
Sox2 #2 (right) chimeric embryos. 

(D) Bisulfite sequencing analysis was performed on P0 MEFs derived from miR290 #21 (top) and Sox2 #2 (bottom) chimeras. For each cell line, the PCR amplicon 
(marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). 

(E) Analysis of the proportion of GFP-positive cells (x axis) and tdTomato-positive cells (y axis) during the course of reprogramming of MEFs derived from miR290 
#21 (upper panel) and Sox2 #2 (lower panel) chimeras. Shown are flow cytometric data from different time points following addition of dox supplemented with 3C 
culture condition. 

(F) Representative images of established miR290 and Sox2 iPSC lines, derived from sorted double-positive (tdTomato + /GFP + ) colonies. 

(G) Bisulfite sequencing analysis was performed on P2 iPSCs derived from miR290 #21 (top) and Sox2 #2 (bottom) MEFs. For each cell line, the PCR amplicon 
(marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). 

See also Figure S5. 
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are neutral to developmental or tissue-specific DNA methyl- 
ation changes, with their activity strictly dependent on the 
methylation state of the adjacent regulatory elements. This is in 
contrast to CGI sequences such as Gapdh or tissue-specific 
elements such as the Dazl promoter-associated sequences, 
which become demethylated or de novo methylated, respec- 
tively, when inserted into the genome of ESCs (Brandeis et al., 
1 994; Sabag et al., 201 4). This indicates that methylation of these 
elements as opposed to imprinted promoters is sequence- 
dependent and subject to trans- acting signals and cell state- 
dependent regulation. 

The RGM reporter system described here is based on the Snrpn 
minimal promoter that is not subjected to methylation changes 
by itself, and therefore GFP expression is solely dependent on 
the methylation state of surrounding sequences. Consistent with 
this premise, ES cells appeared GFP-positive when stably trans- 
fected with the methylated or unmethylated Gapdh/Snrpn-GFP 
vector, but were GFP-negative when transfected with the methyl- 
ated or unmethylated Dazl/Snrpn-G FP reporter. This indicates 
that the Snrpn promoter region can be used as a faithful sensor 
for regional methylation changes of adjacent sequences. 

To investigate whether RGM can report on the methylation 
state of endogenous loci, we targeted CGIs located at Gapdh 
and Dazl promoter regions, resulting in differential methylation 
and activity of the Snrpn reporter. Thus, the Snrpn promoter 
effectively reflects local methylation patterns without affecting 
the endogenous epigenetic state. As most of the tissue-specific 
DNA methylation changes occur in low-density CpG regulatory 
regions, we asked whether RGM could report on the methylation 
state of non-coding low-density CpG regions. We chose two plu- 
ripotency-specific SEs that are associated with the miR290 and 
Sox2 genes and are known to be active and unmethylated in 
ESCs but become methylated and inactive upon cellular differ- 
entiation. CRISPR/Cas-mediated insertion of the Snrpn-tdTo- 
mato reporter into ESCs resulted in tdTomato-positive clones 
but tdTomato expression was silenced in mid-gestation chimeric 
embryos, which reflects the demethylation state of the SEs in 
pluripotent cells and their de novo methylation upon induction 
of differentiation. Consistent with this, MEFs isolated from 
chimeric embryos were tdTomato-negative with both elements 
highly methylated. Upon conversion of the MEFs into induced 
pluripotent stem cells (iPSCs), however, the cells became 
tdTomato-positive reflecting demethylation of the SEs during 
reprogramming to pluripotency. Our results establish that RGM 
reporter activity mirrors the changes of DNA methylation im- 
posed on endogenous CGI and low-density CpG genomic ele- 
ments during development, upon cellular differentiation, and 
during reprogramming. Extensive epigenomic analyses of multi- 
ple tissues and cell types in both human and mice, suggest that 
embryonic development and cell-type specification are associ- 
ated with massive epigenomic remodeling at discrete enhancers 
(Hon et al., 2013; Kundaje et al., 2015; Schultz et al., 2015; Ziller 
et al., 201 3). It will thus be of interest to test whether RGM can be 
utilized to report on the DNA methylation state associated with 
more discrete regulatory regions. Implementing the methylation 
reporter to tissue-specific DMRs holds the promise to further 
elucidate the link between DNA methylation and other epigenetic 
mechanisms, with cell-fate regulation. 



Reprogramming of somatic cells into iPSCs involves extensive 
resetting of the epigenome (Buganim et al., 2013; Hanna et al., 

2010), and coinciding with this notion, recent studies identified 
a key role for epigenetic modifiers during this process (Mansour 
et al., 2012; Rais et al., 2013; Soufi et al., 2012). However, the 
exact kinetics of these epigenetic changes during the reprog- 
ramming process are difficult to define because of cell heteroge- 
neity and the stochastic nature of the reprogramming process. 
Here, we followed the methylation changes of two SEs associ- 
ated with Sox2 and miR290, demonstrating that demethylation 
of both regions is a late event in the reprogramming process. 
Simultaneous activation of endogenous Nanog and miR290 SE 
demethylation is consistent with Nanog directly regulating the 
expression of miR290 cluster during reprogramming to iPS cells 
(Gingold et al., 2014). The gradual activation of the Sox2 tdTo- 
mato reporter followed expression of endogenous Nanog, 
consistent with demethylation of Sox2 SE being a late event in 
the process (Buganim et al., 2012). Systematic deletion of the 
Sox2 upstream SE region was recently shown to dramatically 
affect Sox2 expression in ESCs (Li et al., 2014; Zhou et al., 
2014). Thus, the Sox2 SE methylation reporter cells provide a 
rigorous experimental system to investigate how DNA methyl- 
ation changes at distal regulatory region influence the expres- 
sion of downstream target genes. 

Changes in DNA methylation during development, lineage 
commitment, and disease are dynamic, and studies of epigenetic 
changes are hampered by two experimental constraints that limit 
mechanistic studies of methylation and gene regulation: (1) cur- 
rent methodology provides only a static “snapshot” view of the 
methylation state during cell state transitions, and (2) current 
methylation analyses require the examination of multiple cells 
precluding assessment of epigenetic changes in single cells. 
Given the overwhelming evidence of cell-cell heterogeneity in 
embryos, cultured cells, or disease states such as cancer (Junker 
and van Oudenaarden, 2014), this is a serious limitation for a 
mechanistic understanding of the epigenetic state and gene 
expression during these complex processes. For example, moni- 
toring the course of differentiation in both miR290 and Sox2 re- 
porter cells confirmed the co-existence of cell populations that 
harbor distinct epigenetic states. In contrast, commonly used 
bulk methodologies would not allow isolating and distinguishing 
the different cell populations. Thus, sorting and isolating different 
cell types according to their methylation states can be achieved 
only by using readout for methylation state at single-cell resolu- 
tion. The RGM reporter system overcomes some of the limitations 
of conventional methylation analyses by providing real-time visu- 
alization of DNA methylation at single-cell resolution. As with any 
fluorescent protein-based reporter system, the accuracy to trace 
real-time changes depends on the half-life of the respective FP. 
Because the current version of the methylation reporter does 
not use a destabilized FP, silencing of the reporter after de 
novo methylation-induced repression of the Snrpn promoter is 
likely delayed. To generate a reporter that more rapidly responds 
to DNA methylation, changes would require the use of a destabi- 
lized FP. Targeting additional loci in future studies will allow us to 
further elucidate other possible limitations of the RGM reporter 
system, such as inhibition of the Snrpn transcriptional activity 
by chromatin conformation. 
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As RGM allows measuring dynamics of DNA methylation at 
single-cell resolution, it provides a framework for understanding 
epigenetic changes during cell state transition in heterogeneous 
cell populations. For example, replacing the fluorescent-based 
reporter system with Cre-Lox will enable the generation of epige- 
netic lineage tracing maps. Furthermore, utilizing RGM together 
with conventional gene expression reporters may offer detailed 
insights into the interplay between epigenetic cues and the 
execution of tissue-specific gene expression programs. The 
use of fluorescent reporters as readout for locus-specific methyl- 
ation changes may also provide an effective screening platform 
for the isolation of small molecule compounds that affect the 
methylation state of specific genomic regions. 

EXPERIMENTAL PROCEDURES 
mESCs Cell Culture 

V6.5 mouse embryonic stem cells (mESCs) were cultured on irradiated mouse 
embryonic fibroblasts (MEFs) with standard ESCs medium: (500 ml) DMEM 
supplemented with 1 0% FBS (Flyclone), 1 0 |xg recombinant leukemia inhibitory 
factor (LIF), 0.1 mM beta-mercaptoethanol (Sigma-Aldrich), penicillin/strepto- 
mycin, 1 mM L-glutamine, and 1% nonessential amino acids (all from Invitro- 
gen). For experiments in 2i culture conditions, mESCs were cultured on 
gelatin-coated plates with N2B27 + 2i + LIF medium containing: (500 ml), 
240 ml DMEM/F12 (Invitrogen; 11320), 240 ml Neurobasal media (Invitrogen; 
21 1 03), 5 ml N2 supplement (Invitrogen; 1 7502048), 1 0 ml B27 supplement (In- 
vitrogen; 17504044), 10 [xg recombinant LIF, 0.1 mM beta-mercaptoethanol 
(Sigma-Aldrich), penicillin/streptomycin, 1 mM L-glutamine, and 1% nones- 
sential amino acids (all from Invitrogen), 50 |xg/ml BSA (Sigma), PD0325901 
(Stemgent, 1 jxM), and CFIIR99021 (Stemgent, 3 |xM). 

Reporter Cell Lines 

To generate stably integrated Gapdh and Dazl transgene reporter cell lines, 
either Gapdh- or Daz/-modified PiggyBac transposon (see Supplemental 
Experimental Procedures), and a helper plasmid expressing transposase, 
were transfected into mESCs cells using Xfect mESC Transfection Reagent 
(Clontech), according to the provider’s protocol. Stably integrated reporter 
cells were selected with puromycin (2 mg/ml) for 4 days. 

To generate Dazl, Gapdh, miR290, and Sox2 SE reporter cell lines, target- 
ing vectors, and CRISPR/Cas9 were transfected into mESCs using Xfect 
mESC Transfection Reagent (Clontech), according to the provider’s protocol. 
Forty-eight hours following transfection, cells were FACS-sorted for GFP or 
tdTomato expression (respectively) and plated on MEF feeder plates. Single 
colonies were further analyzed for proper and single integration by Southern 
blot and PCR analysis. 

Flow Cytometry 

To assess the proportion of GFP and tdTomato in the established reporter cell 
lines, a single-cell suspension was filtered and assessed on the LSR II SORP, 
LSRFortessa SORP, or FACSCanto II. 

Retinoic Acid-Induced Differentiation 

mESCs carrying the reporter for both miR290 and Sox2 SE regions were sorted 
for double-positive GFP and tdTomato expression and plated on gelatin- 
coated plates in ES cell medium (+LIF). The next day, cells were washed 
with PBS, resuspended in basal N2B27 medium (2i medium without LIF, insu- 
lin, and the two inhibitors), and supplemented with 0.25 ^M RA. Medium was 
replaced every other day. 

Blastocyst Injections for the Generation of Chimeras and Secondary 
MEFs 

Blastocyst injections were performed using (C57BI/6xDBA) B6D2F2 host em- 
bryos. In brief, B6D2F1 females were hormone primed by an intraperitoneal 
(i.p.) injection of pregnant mare serum gonadotropin (PMS, EMD Millipore) fol- 



lowed 46 hr later by an injection of human chorionic gonadotropin (hCG, VWR). 
Embryos were harvested at the morula stage and cultured in a C0 2 incubator 
overnight. On the day of the injection, groups of embryos were placed in drops 
of M2 medium using a 16-um diameter injection pipet (Origio). Approximately 
ten cells were injected into the blastocoel cavity of each embryo using a Piezo 
micromanipulator (Prime Tech). Approximately 20 blastocysts were subse- 
quently transferred to each recipient female; the day of injection was consid- 
ered as 2.5 days postcoitum (DPC). Fetuses were collected at 13.5 DPC for 
the extraction of embryonic fibroblasts as described before (Buganim et al., 
2012 ). 

Southern Blots 

Genomic DNA (10-15 |xg) was digested with appropriate restriction enzymes 
overnight. Subsequently, genomic DNA was separated on a 0.7% agarose 
gel, transferred to a nylon membrane (Amersham) and hybridized with 32P 
random primer (Stratagene)-labeled probes. 

Reprogramming to iPSCs 

MEFs isolated from miR290 and Sox2 fetuses were plated at density of 50,000 
cells per 6-well in gelatin-coated plates with standard MEF medium (mESCs 
media without LIF). The following day MEF medium was replaced with mESCs 
medium containing 2 mg/ml doxycycline (Sigma). Alternatively, cells were 
grown in mESCs medium containing 2 mg/ml doxycycline and a combination 
of three compounds (TGF-(3 antagonist ALK5 inhibitor II, GSK3b antagonist 
CHIR99021 , ascorbic acid) as described before (Vidal et al., 2014). Medium 
was replaced every other day during the course of reprogramming. 
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SUMMARY 

Embryonic stem cells (ESCs) repress the expression 
of exogenous proviruses and endogenous retrovi- 
ruses (ERVs). Here, we systematically dissected the 
cellular factors involved in provirus repression in em- 
bryonic carcinomas (ECs) and ESCs by a genome- 
wide siRNA screen. Histone chaperones (Chafla/ 
b), sumoylation factors (Sumo2/Ube2i/Sae1/Uba2/ 
Senp6), and chromatin modifiers (Trim28/Eset/At- 
f7ip) are key determinants that establish provirus 
silencing. RNA-seq analysis uncovered the roles 
of Chafla/b and sumoylation modifiers in the re- 
pression of ERVs. ChIP-seq analysis demonstrates 
direct recruitment of Chafla and Sumo2 to ERVs. 
Chafla reinforces transcriptional repression via its 
interaction with members of the NuRD complex 
(Kdmla, Hdacl/2) and Eset, while Sumo2 orches- 
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trates the provirus repressive function of the canon- 
ical Zfp809/Trim28/Eset machinery by sumoylation 
of Trim28. Our study reports a genome-wide atlas 
of functional nodes that mediate proviral silencing 
in ESCs and illuminates the comprehensive, inter- 
connected, and multi-layered genetic and epigenetic 
mechanisms by which ESCs repress retroviruses 
within the genome. 

INTRODUCTION 

The expression of proviruses and endogenous retroviruses 
(ERVs) is restricted in pluripotent stem cells (Feuer et al., 1989; 
Niwa et al., 1983; Teich et al., 1977). This silencing has likely 
evolved for the protection of germline cells from insertional muta- 
genesis (Gaudet et al., 2004; Walsh et al., 1998). The expression 
and DNA methylation profiles of the Moloney murine leukemia 
virus (MMLV) have been investigated in embryonic carcinoma 
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Figure 1. Genome-wide siRNA Screen for Regulators of Proviral Silencing in Mouse F9 ECs 

(A) Schematic of the proviral MMLV-G/p reporter assay. The map of the proviral reporter is shown (upper panel). LTR (black) indicates the long terminal repeats, 
while PBS (blue) represents the primer binding site. F9 cells were infected with the reporter virus and subjected to reverse transfection with the siRNA library in 
384-well plates. A representative image for Gfp fluorescence (green) and nuclear Hoechst 33342 staining (blue) in a 384-well plate is shown. In each 384-well 
plate, non-targeting siRNA control (siNT) and positive control siRNA against Trim28 and Eset (s\Trim28 and si Eset) were added. 
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cells (ECs) and embryonic stem cells (ESCs) (Niwa et al. , 1983). 
DNA methylation is thought to repress the expression of viral 
genes in differentiated cells, while repression in pluripotent cells 
is mediated by both c/s- acting de novo methylation of the inte- 
grated proviruses (Gaudet et al., 2004; Walsh et al., 1998) and 
trans- acting transcriptional repressors (Petersen et al., 1991; 
Stewart et al., 1982; Walsh et al., 1998; Wolf et al., 2008a; Wolf 
and Goff, 2007). 

It has been reported that many ERVs affect cellular gene activ- 
ity by acting as alternative promoters or enhancers (Peaston 
et al., 2004). For example, MERVL is transiently activated during 
the mouse two-cell (2C) stage, regulating the expression of 2C- 
specific genes (Macfarlan et al., 201 2). ERVs may also function in 
the reprogramming of somatic cells into induced pluripotent 
stem cells (iPSCs). Specific ERVs are re-activated during the 
reprogramming process, while other classes of ERVs have to 
be silenced to attain complete reprogramming (Friedli et al., 
2014; Wissing et al., 2012). Together, these studies suggest 
that proviral silencing is a characteristic of the pluripotent state, 
and the precise expression of ERVs have critical roles during 
embryogenesis and development. 

Various studies have implicated diverse epigenetic mecha- 
nisms in the silencing of retroviruses and ERVs. Repression is 
thought to be dependent on a conserved sequence element 
termed the primer binding site (PBS). Factors such as Zfp809, 
Trim28, and Eset are responsible for mediating the H3K9me3 
repressive silencing mechanism (Friedli et al., 2014; Rowe 
et al., 2010; Wolf and Goff, 2007, 2009; Wolf et al., 2008b). 
Eset was shown to be involved in the repression of retroviruses 
and subfamilies of ERVs, predominantly of class I and II ERVs 
(Karimi et al., 2011; Matsui et al., 2010). More recently, viral- 
silencing factors such as the zinc finger protein Yin yang 1 
(Yyl), Erb3 binding protein 1 (Ebpl), and the polycomb repres- 
sive complex 2 (PRC2) catalytic subunit Ezh2 (Schlesinger 
et al., 2013; Schlesinger and Goff, 2013; Wang et al., 2014) 
have been described. Other studies reporting the role of host 
factors governing ERVs in model organisms, such as Saccharo- 
myces cerevisiae (Maxwell and Curcio, 2007) have also pro- 
vided critical evolutionary insight into the dynamics of retroviral 
regulation. 

Despite many efforts to identify the factors involved, many 
components of the epigenetic machinery required for stable 
silencing of proviruses and ERVs remains poorly characterized. 
To advance our understanding, we developed a powerful high- 
throughput screening approach based on a provirus MMLV- 
G/p reporter (Schlesinger et al., 2013) and genome-wide small 
interfering RNA (siRNA) knockdown. Our screen identified 303 
determinants of viral silencing in mouse ESCs with high confi- 



dence and provides a genome-wide functional interrogation of 
determinants mediating proviral silencing in pluripotent embry- 
onic stem cells. 

RESULTS 

Unbiased Genome-wide siRNA Screen for Determinants 
of Proviral Silencing in Embryonic Carcinoma Cells 

To define the factors involved in the silencing process, we devel- 
oped a high-throughput screening approach based on a provirus 
MMLV-G/p reporter and siRNA knockdown in F9 ECs (Figure 1 A). 
F9 cells were infected with the MMLV-G/p virus and then reverse 
transfected with siRNA in 384-well plates. Expression of Gfp on 
day 4 post-infection indicated retrovirus activation. 

We first confirmed the sensitivity of the reporter assay via 
knockdown of canonical repressive genes Trim28 and Eset. 
Consistently, imaging, and fluorescence-activated cell sorting 
(FACS) analysis showed that knockdown of both factors dramat- 
ically relieved the repression of retroviral Gfp (Figures SI A and 
SI B). We next carried out a pilot screen on the kinome siRNA li- 
brary in F9 cells, using non-targeting (siNT) Trim28 and Eset 
siRNAs as controls. The kinome library screen was analyzed 
by Z-prime score (Figures SI C-S1 F). From the screen, we iden- 
tified both known (Trim28 and Cdk9) and undetermined factors 
(Chuk, Epha4, Csnkle, Sgppl , and Npp4a) responsible for retro- 
virus silencing (Figure S1G). Cdk9 was previously reported to 
interact with HIV-1 Tat protein and regulate HIV-1 transcription 
(Kao et al., 1987). 

Next, we carried out a whole genome siRNA screen targeting 
20,000 genes in F9 cells (Figure 1A). Candidates that caused 
excessive cell death upon siRNA knockdown were excluded 
using a stringent nuclei number cut-off threshold. Based on 
the normalized Gfp signal cut-off value, which short-listed fac- 
tors that had values larger than 2 SDs from the mean of 
the negative controls (Figure IB), 650 factors were short-listed 
(Table SI). Among the hits are factors previously implicated 
in retroviral silencing process, such as Eset, Zfp809, Yyl, 
and Trim28. In addition, new candidates identified include 
Ube2i, Pena, Hist1h3c, Mphosph8, Adcy6, Sh3bp1 , and 
Thynl (Figure 1C). 

To validate the genome-wide siRNA screen, we performed 
secondary siRNA screens utilizing the MMLV-G/p reporter and 
an independent N\N\L\/ -mCherry reporter. We observed strong 
correlation between the two reporters (Figure ID). To minimize 
possible non-specific effects from the pooled siRNA, we de- 
signed two pairs of short hairpin RNAs (shRNAs) for 31 candidate 
genes and three non-candidate genes. shRNA validation was 
performed in F9 cells, followed by FACS analysis of Gfp 



(B) Dot plot for genome-wide siRNA screen. A cut-off threshold was set at 0.37 (dotted line). Candidate genes above the threshold showed significant Gfp 
reactivation. 

(C) Representative images of Gfp rescue for selected hits from the genome-wide screen. Gfp (green) and Hoechst 33342 staining of the nucleus (blue) are shown. 

(D) Secondary siRNA screen for 74 genes. Results for reactivation of proviral Gfp or mCherry reporters are shown as heatmaps. Intensity of green or red color 
represents the level of reactivation of Gfp and mCherry reporters respectively. See Supplemental Experimental Procedures for details on the gene selection 
criteria and experimental design. 

(E) Validation of candidate genes using shRNA knockdown. Gfp signal was detected by FACS. The percentage of Gfp activation is shown on the y axis. Values are 
mean ± SEM from independent replicate experiments. 

See also Figure SI and Table SI. 
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Figure 2. Bioinformatics Analyses for the Genome-wide siRNA Screen and the ESC Specificity of the Candidate Genes 

(A) Interactome analysis. Cellular localization of the hits is indicated. 

(B) Interactions observed in hits of different ranking tiers. Localization of hits is indicated as in (A). P values and number of interactions are indicated. 

(C and D) Validation of MMLV-G/jo rescue by siRNA knockdown of the top candidates in D3 and El 4 ESCs. Non-targeting siRNA (siNT) and siRNA targeting non- 
hits (Dmntl , Ehmt2, Senp7) were selected as controls. (C) Representative images of Gfp rescue by siRNA knockdown of the indicated hits. Gfp (green) and 
Hoechst 33342 nucleus staining (blue) are shown. (D) Bar chart graphs for Gfp activation. Relative Gfp signal is shown on the y axis. Values are mean ± SEM from 
independent replicate experiments. 

(E) Representative images of MMLV-m Cherry and MMLV-G/jo rescue by siRNA knockdown of selected top hits in MEF and 3T3 cells. mCherry (red), Gfp (green) 
and Hoechst 33342 nucleus staining (blue) are shown. 

(legend continued on next page) 
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expression. shRNA knockdown efficiencies were confirmed by 
qPCR (Figure SI H) and western blot analysis for selected genes 
(Figure SI I). Notably, we observed robust Gfp reactivation for the 
majority of top hits (Figure IE). From the results of secondary 
siRNA and shRNA screens, we focused on the top 303 hits 
that were highly corroborative with the primary screen and are 
considered high confidence candidates. 

Network Analysis of the Candidates Reveals Multiple 
Interacting Pathways Involved in Proviral Silencing 

We performed Gene Ontology (GO), KEGG, and Interpro analysis 
(Huang et al., 2009) on the top 303 hits and elucidated 148 sta- 
tistically enriched biological processes and pathways, including 
chromatin modification and organization, protein sumoylation 
and phosphorylation, regulation of transcription, DNA replica- 
tion, DNA repair, and methylation (Figure S2A; Table S2). 
Protein-protein interaction analysis of the high confidence hits 
demonstrates tight and dense interaction between the candidate 
proteins (Figure 2A). In addition, cellular component analysis re- 
vealed that the candidates were widely distributed in different 
sub-cellular fractions (Figures 2A and S2B). These suggest that 
proviral silencing is controlled by multilayered machineries 
involving components of different cellular pathways and with 
varied cellular localization. 

Candidate Genes Are Potent Repressors of Provirus 
Expression in Embryonic Stem Cells 

We analyzed the expression profiles of the candidate genes in 
over 100 cell lines using the cTen database (Shoemaker et al., 
2012). The majority of candidate genes are highly expressed in 
embryonic stem cell lines and are low in other tissue-specific 
cell lines (Figure S2C). The expression of selected candidates 
was further tested in the mouse ESC lines El 4 and D3, mouse 
EC lines F9 and PI 9, as well as in differentiated mouse embryo 
fibroblasts (MEFs). Consistent with cTen enrichment scores, 
qPCR analyses showed embryonal and stem cell-specific 
expression of the candidates (Figure S2D). 

To further interrogate the function of our candidate hits, we 
performed network analysis of the hits based on their tiered 
ranking. We observed greater interactions among our top 50 
candidates, although the lower ranked hits also exhibited spe- 
cific interactions indicative of their biological significances (Fig- 
ure 2B). Among the top 20 hits are the histone chaperones 
(Chafla/b), sumoylation modification genes (Ube2i, Sumo2, 
Uba2, Sael, and Senp6), and chromatin-bound factors (Eset, 
Atf7ip, Zfp809, Trim28). To test the functional specificity of these 
strong candidates in mESCs, we conducted siRNA and shRNA 
knockdowns in two mESC lines El 4 and D3 and in two differen- 
tiated cell types, 3T3 and MEFs. The results of the Gfp reporter 
rescue assay from mESC lines corroborate well with the primary 
screen done in F9 cells (Figures 2C, 2D, and S2E). In contrast, 
MMLV-driven expression of Gfp or mCherry was high in 3T3 
and MEFs at the outset and knockdown of candidate genes 



did not result in perturbations of the reporter signal in these 
cell lines (Figures 2E and S2F). 

To further assess the ESC specificity of our candidates, we 
differentiated El 4 and D3 cells via embryoid body (EB) formation 
and neural differentiation (Ying et al., 2003). The differentiated 
cells lost their ESC-specific morphologies and pluripotency 
markers and expressed high levels of differentiation genes (Fig- 
ures 2F and S2G). Consistent with a previous report, the MMLV 
virus remain silenced in differentiated ESCs (Niwa et al., 1983). 
None of the candidate gene knockdowns in the differentiated 
cells could rescue MMLV-G/jo reporter expression (Figures 2G 
and S2H), suggesting that alternative or additional silencing 
pathways are active in these cells. Relative copy number of inte- 
grated reporters in El 4 and the differentiated cells was indistin- 
guishable, ruling out the possibility of reduced viral integration in 
the latter (Figures S I and S2J). In addition, knockdown of the 
top hits did not reduce provirus integration efficiency in El 4 cells 
(Figure S2K). Of note, we observed no significant change in Gfp 
signal driven by an integrated non-LTR reporter (PiggyBac-CAG- 
Gfp) upon knockdown of the top hits (Figures S2L and S2M). This 
strongly suggests that the mode of proviral regulation by the fac- 
tors is transcriptional or epigenetic. 

Chafla/b and Sumoylation Modification Complex Play 
Critical Roles in Regulating ERVs 

To evaluate the roles of Chafl a/b and the sumoylation factors in 
ERV regulation, we measured ERV expression by qPCR upon 
depletion of the candidates. Consistent with a previous study, 
Trim28 knockdown elicited reactivation of IAP elements in 
ESCs (Figure S3A) (Rowe et al., 201 0). Intriguingly, we found up- 
regulation of class I (GLN), class II (MMERVKIOc), and class III 
(MERVL) elements following depletion of the factors from the 
Cafl complex, sumoylation complex, and Atf7ip (Figure S3A). 
Notably, Northern blot assays confirmed increased transcription 
of MERVL, but not of IAP and MusD elements in Chafla/b 
depleted El 4 cells (Figure S3C). Meanwhile, knockdown of 
selected weaker candidates also showed consistent de-repres- 
sion of MERVL but not of the other ERVs (Figure S3B). 

To further delineate the regulatory roles of the candidates on 
ERVs, we performed genome-wide RNA sequencing (RNA- 
seq) of Chafla/b-, Sumo2~, Sael-, Ube2i-, Ube2~, Senp6~, 
Trim28~, Eset-, and Af/7/p-depleted cells. Transcriptomic ana- 
lyses revealed significant de-repression of several families of 
ERVs upon depletion of each factor (Figure 3A; Table S3). In 
contrast to their effects on global gene expression (Figure S3D), 
the majority of the ERV targets are upregulated upon shRNA 
knockdown (Figure 3B). Together, these suggest an ERV-spe- 
cific repressive function of the candidates. 

Next, we evaluated the ERV classes regulated by the candi- 
dates. Chafla/b depletion resulted in the de-repression of large 
numbers of Class III ERVs, while the sumoylation and canonical 
factors regulated more Class II ERVs (Figure 3C). Cluster anal- 
ysis detected strong correlation of ERV regulation within the 



(F) Representative images for Oct4 and Nestin staining on El 4 cells (upper panel) and El 4 ESCs derived differentiated neural cells (lower panel). 

(G) MMLV-G/p rescue in E14-derived neural cells by siRNA knockdown of selected top hits. Relative Gfp signal is shown on the y axis. Values are mean ± SEM 
from independent replicate experiments. 

See also Figure S2 and Table S2. 
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Figure 3. Histone Modifiers and Sumoylation Factors Regulate ERVs in Mouse Embryonic Stem Cells 

(A) Frequency histogram of gene expression from RNA-seq data after Chafla, Sumo2, Trim28, or Eset depletion in El 4 cells. Log2 fold change of expression 
levels is shown on the x axis. The number of genes at a given expression level is shown on the y axis. 

(B) Percentage stacked columns indicating the up or downregulation of ERVs upon the depletion of the indicated factors. 

(C) Percentage stacked columns indicating the classes of upregulated ERVs upon the depletion of the indicated factors. 

(D) Clustering analysis of the indicated RNA-Seq libraries based on differential ERV expression. Heatmap color intensity signifies the correlation strength between 
0 (red-high similarity) to 0.8 (yellow-high difference). 

(E and F) Genome-wide de-regulation of ERVs in El 4 cells after depletion of the indicated genes. RNA-seq data for RNAi samples and the shVector control were 
used to calculate the Log2 fold change values. Red dots indicate the elements with significantly increased expression. 

(G and H) Venn diagrams demonstrating the number of commonly and differentially upregulated ERVs among the depletion of indicated factors. 

See also Figure S3 and Table S3. 
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Chafla/b, sumoylation factors, and the chromatin binding fac- 
tors Trim28, Atf7ip, and Eset (Figure 3D), whereas the anal- 
ysis of global gene expression displayed a different pattern 

(Figure S3E). 

Remarkably, Trim28 shares significant similarity with both the 
Chafla/b and sumoylation factors in their ERV regulation (Fig- 
ure 3D), suggesting overlapping mechanisms. ERVs controlled 
by Atf7ip overlapped extensively with the ones regulated by 
Trim28 or Eset (Figure S3F), indicating that Atf7ip may be integral 
to the canonical Krab-Zf p/Tri m28/Eset machinery. Atf7ip was 
shown to be a co-factor of Eset that helps in facilitating the con- 
version of H3K9me2 to H3K9me3 (De Graeve et al., 2000; Wang 
et al., 2003). Furthermore, the ERVs regulated by Chafla over- 
laps significantly with the ones regulated by Chaflb (Figures 
3D, 3E and 3G), but differ significantly from those controlled by 
Sumo2 (Figures 3F and 3H). One key feature of the cluster of su- 
moylation genes is the strong correlation between the factors in 
the specific control of their ERV targets as shown by the tight 
pairwise correlation (Figure 3E). This suggests a coordinated 
mechanism involving multiple members of the same sumoylation 
pathway. Interestingly, most ERVs regulated by Sael and Ube2i 
are part of the larger number of ERVs governed by Sumo2, sug- 
gesting a central role for Sumo2 in this sumoylation process (Fig- 
ure 3G). It is noteworthy that many ERVs regulated by Sumo2 are 
similarly governed by Trim28 (Figures 3F and 3H). 

To validate the RNA-seq data, we performed qPCR on each 
class of ERVs (Figure S3G). Consistently, RLTR6_Mm/ERV1 
was specifically regulated by the sumoylation factors, while ET- 
nERV3-int/ERVK was regulated by Atf7ip, Eset, and Chafla, but 
not by the sumoylation factors. MT2_Mm/ERVL was sharply up- 
regulated upon the depletion of Chafla/b, while expression was 
less perturbed with depletion of factors from the other two clus- 
ters. Finally, LTR16D was upregulated upon depletion of genes 
from all the clusters. 

Chafla and Sumo2 Are Directly Recruited to ERVs 

We wanted to determine whether Chafla and Sumo2 are en- 
riched on genomic ERVs. First, we introduced 3xHA tags at 
the 3'end of the endogenous Chafla locus in F9 cells using 
CRISPR/Cas technology (Figure S4A). The Chaf1a-3xHA cell 
line was characterized by shRNA knockdown, which led to the 
specific reduction of Chaf1a-3xHA as measured by western 
blot and immunostaining (Figure S4B). In addition, a Zfp809- 
3xHA overexpression D3 cell line was also established and simi- 
larly characterized (Figure S4C). The reliability of the Sumo2 
antibodies used for chromatin immunoprecipitation (ChIP) was 
confirmed with knockdown of Sumo2 followed by western blot 
analysis (Figure S4D). To survey the global binding profiles 
of Chafla, Sumo2, Trim28, and Zfp809 on genomic ERV loci, 
we performed ChIP sequencing (ChIP-seq). The quality of 
the ChIP DNA was determined by qPCR and motif analysis. 
Zfp809-3xHA ChIP-qPCR yielded high enrichment at proline 
PBS site (Figures S4E and S4F), and Trim28 ChIP-qPCR showed 
strong binding at a previously reported target gene Ptpn18 (Fig- 
ure S4G) (Hu et al., 2009). 

ChIP-seq analysis revealed that both Chafla and Sumo2 are 
recruited to loci of members of several classes of ERVs (Fig- 
ures 4A and 4B; Table S4). We next asked if the bound ERV 



loci are enriched for any histone modifications. We compared 
the Chafla, Sumo2, Trim28, and Zfp809 ChIP-seq data with 
publicly available datasets of histone marks and Eset ChIP- 
seq. Although the majority of ERVs bound by Chafla are en- 
riched with H3K9me3 (Figure 4C), the H3K9me3 is of lower 
intensity compared to that of Trim28, Zfp809, and Sumo2 
bound ERVs (Figure 4C). Intriguingly, considerable proportions 
(15%) of Chafla bound ERVs are also enriched for the active 
H3K4me3 modification (Figure 4C). Furthermore, Chafla- 
bound ERVs exhibit higher levels of H3K4me2 and H3K9Ac 
(Figure S4H). This raises the possibility that additional acces- 
sory proteins may be required for Chafla to exert the silencing 
effects. Notably, Sumo2-targeted ERV loci are associated with 
elevated H3K9me3 levels and reduced levels of H3K4me3 
modification. This binding pattern strongly resembles that of 
Zfp809 and Trim28 (Figure 4C). In contrast, the non-ERV loci 
bound by Chafla were enriched with abundant H3K4me3 
marks and had no trace of H3K9me3 modifications. On the 
other hand, Sumo2/Trim28/Zfp809-bound loci exhibit detect- 
able but low levels of H3K9me3 (Figure 4D). Collectively, this 
indicates differing modes of regulation by which individual fac- 
tors control ERVs and non-ERV targets (Figures 4C, 4D, and 
S3C). 

To determine the action of Chafla and Sumo2, we repre- 
sented ERV loci bound by these factors in Venn diagrams. We 
found that Trim28 binds 56% of Chafla-bound sites, while 
57% of Chafla ERVs are also targets of Sumo2 (Figure 4E). 
Moreover, only 31% of Chafla ERV loci are enriched for 
Zfp809 (Figure S4I). In contrast, 77% of Trim28 targets and 
73% of Eset-bound ERVs are accompanied by enrichment of 
Sumo2 (Figures 4E and S4I). When we extend the analysis to 
three factors, we observed that more than 80% of Chafla/ 
Trim28 and Chafla/Eset common targets have Sumo2 binding 
(Figure 4F). These observations strongly suggest a possible 
role of Sumo2 in Trim28/Eset ERV regulation. The co-regulation 
of Chafla and Sumo2 with the canonical Zfp809/Trim28/Eset 
machinery seems to be ERV-specific as very little overlap was 
observed between the factors on non-ERV loci (Figure S4J). 
Collectively, in terms of ERV regulation, Chafla binding is clus- 
tered away from the Sumo/Zfp809/Trim28/Eset axis (Figures 
4G, 4H, and S4K). This is remarkably similar to the pattern 
observed from the RNA-seq data (Figure 3D). Overall, our ChIP 
data provides the first biochemical demonstration that a histone 
chaperone and a sumoylation modification protein can exert 
direct regulation of genomic ERVs. 

Sumo2 Orchestrates the Viral Silencing Activities of 
Trim28 through Its Sumoylation Modification 

Our genome-wide siRNA screen identified Sumo2, and not 
Sumol or Sumo3, to have a distinct role in proviral silencing (Fig- 
ures S5A-S5C). The global RNA-seq and ChIP-seq data further 
suggest that Sumo2 may repress proviruses and ERVs through 
modulation of the Trim28/Eset machinery (Figure 5A). To test 
this possibility, we first performed Sumo2 ChIP-qPCR and iden- 
tified its binding on the proviral LTR. Importantly, when Trim28 
was knocked-down, the level of Sumo2 binding on both proviral 
elements and most of the ERVs tested was drastically reduced 
(Figures 5B and 5C). In contrast, enrichment of Sumo2 was not 
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Figure 4. Direct Recruitment of Chafla and Sumo2 to Genomic ERVs 

(A) Heatmap indicating the recruitment of Sumo2, Trim28, Zfp809, and Chafla on the indicated ERVs of different classes (l-lll) and Line/Sine elements (LS). ChlP- 
seq was performed for the indicated factors, Smad3 is used as a control. Red indicates binding whereas black indicates the absence of binding. 

(B) Heatmaps of Chafl a enrichment at the genomic regions flanking MER67C and MMERVK1 Oc-int (left panels) and Sumo2 enrichment at the genomic regions 
flanking RLTR6 and ETnERV3-int (right panels). 

(C) Heatmaps of histone modifications at the genomic regions of the ERV loci bound by Chafla, Sumo2, Trim28, and Zfp809. The heatmaps are clustered 
according to the enrichment profile of H3K4me3. 

(legend continued on next page) 
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affected by Chafla knockdown (Figures S5D and S5E). Further- 
more, the removal of Sumo2 abolished the binding of Trim28 at 
the LTR (Figure 5D). 

To interrogate whether Sumo2 directly targets Trim28 for su- 
moylation, we studied well characterized 3xFlag-Sumo2 El 4 
cells generated using CRISPR/Cas technology (Figures S5F- 



Figure 5. Sumo2 Regulates Proviruses by 
Post-translational Modification of Trim28 

(A) Venn diagrams demonstrating the number of 
common and uniquely-bound ERV loci among the 
indicated factors. Sumo2 interacts extensively 
with the factors from the canonical pathway. Per- 
centage values indicate uniquely bound sites. 
(B-D) Sumo2 functions through Trim28 in proviral 
silencing. Sumo2 and Trim28 ChIP experiments 
were conducted on the samples with depletion of 
Sumo2 or Trim28. The enrichment was measured 
by qPCR. Data is presented as mean ± SEM from 
independent replicate experiments. 

(E) Trim28 is modified by Sumo2 in vivo. A 3xFlag 
tag was added to the 5' end of Sumo2 genomic 
region using CRISPR/Cas in El 4 cells. Two ho- 
mozygous lines were selected for the immuno- 
precipitation assays. NEM was added to protect 
the sumoylated proteins from desumoylation by 
SENPs in the cell lysates. 

(F) Venn diagrams demonstrating the number of 
common and uniquely-bound ERV loci among 
indicated factors. The majority of the Trim28/ 
H3K9me3 enriched ERVs are also bound by 
Sumo2. Percentage values indicate uniquely- 
bound sites. 

(G and H) Knockdown of Sumo2 and Trim28 
significantly reduced the H3K9me3 enrichment on 
proviral PBS and ERVs. H3K9me3 ChIP was per- 
formed on the samples with depletion of Sumo2 or 
Trim28. Data is presented as mean ± SEM from 
independent replicate experiments. 

(I) Knockdown of Trim28 and Sumo2 increased the 
active H3K4me3 mark on proviral elements. 
H3K4me3 ChIP was performed on samples with 
depleted Sumo2 or Trim28. Data is presented as 
mean ± SEM from independent replicate experi- 
ments. 

See also Figure S5. 



S5I). Notably, we identified Trim28 in 
the pull-down of sumoylated proteins 
(Figure 5E). 

Venn diagram analysis of ChIP-seq 
data indicates that ~90% of Sumo2/ 
Trim28-bound ERV sites are marked with H3K9me3 modifica- 
tions (Figure 5F). Trim28 is known to mediate the recruitment 
of Eset, which in turn deposits the repressive H3K9me3 mark 
at the proviral LTR (Matsui et al., 2010). Consistently, Sumo2 
knockdown resulted in concomitant reduction in H3K9me3 
marks and elevation of H3K4me3 modifications at the proviral 



H3K4me3 ChIP 




(D) Enrichment of several histone marks at the genomic regions of the non-ERV loci that are bound by indicated factors. The reads in the heatmaps are clustered 
according to the enrichment profile of H3K4me3. 

(E and F) Venn diagrams demonstrating the number of commonly and uniquely-bound ERV loci among the indicated factors. Percentage values indicate uniquely 
bound sites. 

(G) UCSC genome browser screenshots. Chafla, Sumo2, Trim28, and Zfp809 bind ETnERV3-int-ERVK, while lAP-d-int/ERVK is bound specifically by Sumo2 
and Trim28. Both ERVs are enriched with H3K9me3. 

(H) Clustering analysis of the ERVs bound by the indicated factors. The color intensity signifies strength of correlation. Red indicates strong correlation, whereas 
yellow indicates weak correlation. 

See also Figure S4 and Table S4. 
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(legend continued on next page) 
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elements and ERVs to levels that are comparable with that seen 
upon Trim28 knockdown (Figures 5G-5I). 

Chafla Has Differential Regulatory Roles on Class I, II, 
and III ERVs 

Venn diagram analysis on ERVs bound by Chafla, Trim28, and 
those associated with the H3K9me3 modification revealed that 
about 64% of ERVs co-bound by Chafla and Trim28 are en- 
riched with H3K9me3 (Figure 6A). In comparison, only 23% of 
Chafl a/Trim28 bound non-ERV loci are marked with H3K9me3 
(Figure S6A). This concurs with the notion that Chafla and 
Trim28 exert ERV-specific repressive functions. In particular, 
there are significant numbers of ERVs co-bound by Chafla 
and Trim28, or exclusively bound by Chafl a that are not marked 
with H3K9me3, suggesting that Chafla may adopt alternative 
repressive mechanisms on these ERVs. To this end, we classi- 
fied the ERVs into four categories, namely, those bound by 
Chafl a+Trim28+FI3K9me3, Chafl a+Trim28, Chafla only, and 
Trim28 only (Table S5). Interestingly, the Chafla only category 
has the highest percentage of class III ERVs (Figure 6B), while 
the Chafl a+Trim28+H3K9me3 category primarily belong to 
class I and class II ERVs (Figure 6B). Consequentially, the dot 
plots (Figures 6C and S6B) correlating ERV upregulation and 
the enrichment of histone marks further highlighted the low levels 
of FI3K9me3 on Chafl a-regulated class III ERVs. 

Specific class III ERVs are highly expressed in early embryonic 
development and downregulated at the morula and blastula 
stages. Histone demethylase Kdmla (Macfarlan et al., 2012) 
and H3K9 dimethyl transferase G9a are the key epigenetic reg- 
ulators of these ERVs (Leung et al., 2011; Maksakova et al., 
2013). It was found that Kdmla and histone deacetylase 
Hdacl/2 cooperatively contribute to transcriptional silencing 
(Shi et al., 2004). Hdacs have been shown to repress MERVL in 
concert with Kdmla in pluripotent stem cells (Macfarlan et al., 
2011; Reichmann et al., 2012). Interestingly, Kdmla is one of 
the candidate hits in our siRNA screen (Table SI). To further 
dissect the mode of ERV regulation within each of the four cate- 
gories, we integrated our Chafl a and Trim28 ChIP-seq data with 
datasets for epigenetic factors, such as Kdmla and Hdacl/2. 
Surprisingly, the ERVs from the Chafla only category display 
the highest enrichment of Kdmla and Hdacl/2 in comparison 
to the other categories (Figures 6D and 6E). In contrast, the 
ERVs bound by Chafl a+Trim28+H3K9me3 exhibit low levels of 
Kdmla and Hdacl/2 binding (Figures 6D-6F and S6D). Consis- 
tently, the Chafla only category is characterized by significantly 
higher levels of H3K4me2, H3K9Ac, and H3K27Ac marks, which 



are the substrates of Kdml a and Hdacs, respectively (Figures 6D 
and S6C). We further performed ERV expression analysis using a 
published mESC Kdmla knockdown RNA-seq dataset (Agarwal 
et al., 2015). Kdmla knockdown resulted in mostly class I and III 
ERV upregulation, in a manner similar to Chafla knockdown (Fig- 
ure S6E). In terms of ERVs regulated, Kdmla/Chafla knock- 
down has 80% more overlap than Kdm1a/Trim28 knockdown 
(Figures S6F and S6G). Overall, our data indicates that Chafla 
regulates class I, II, and III ERVs through vastly different mecha- 
nisms, which may depend on the co-regulators. 

Chafla Represses Proviruses through Epigenetic 
Co-factors 

Chafl a is the core component of the chromatin assembling fac- 
tor complex (Cafl) that also includes Rbbp4. Interestingly, only 
Chafl a/b exhibited a proviral silencing function, while the knock- 
down of Rbbp4 had no effect (Figures S7A and S7B). Moreover, 
our siRNA screen did not uncover other histone chaperones 
necessary for retroviral silencing, further highlighting the speci- 
ficity of Chafl a/b in this process (Figure S7B). To further 
delineate the function of Chafla, we performed a pull-down of 
Flag-tagged Chafla followed by stable isotope labeling using 
amino acids (SILAC)-based quantitative mass spectrometry 
(MS) analysis (Figure 7A). The complete list of Chafl a-interacting 
proteins includes several known and unknown factors (Figure 7A; 
Table S6). Chafla has previously been shown to interact with 
chromatin modifying factors (Quivy et al., 2004; Sarraf and 
Stancheva, 2004). Indeed, we identified several epigenetic mod- 
ifiers that appeared in both the Chafla MS and genome-wide 
siRNA screen list, such as Kdmla, Smarccl, and Eset. Using 
co-immunoprecipitation (colP), we confirmed the interaction 
of Chafla with histone methyltransferase Eset, histone de- 
methylase Kdmla, deacetylase Hdac2, and histone chaperones 
Chafl b (Figures 7B-7D, S7C, and S7D). 

To investigate the direct effects of Chafla at provirus loci, we 
used the Chafl a-3xHA CRISPR F9 cell line for ChIP-qPCR anal- 
ysis. We observed direct localization of Chafla to the proviral 
LTR elements (Figure 7E), which was further confirmed by 
Chafl a-V5 ChIP (Figure S7E). To address the relationship be- 
tween Chafla and Trim28, we performed ChIP on Trim28 upon 
Chafla knockdown. The binding of Trim28 was significantly 
abolished by the knockdown of Trim28 itself, whereas Chafl a- 
knockdown elicited no effect (Figure S7F). This suggests that 
Trim28 recruitment to the provirus is independent of Chafla. 
Moreover, we did not detect any change in Chafla enrichment 
upon Sumo2 depletion (Figure S7G). 



(B) Percentage stacked columns demonstrating the classes of ERVs bound by the indicated categories on the x axis. 

(C) The correlation between the upregulation of the different classes of ERVs upon Chafla depletion and the enrichment of H3K9me3 mark. The data is plotted 
using shChafla RNA-seq and H3K9me3 ChIP-seq. Grey, orange, and yellow dots represent ERVs with significantly increased expression in class I, II, and III, 
respectively. Black dots indicate the non-regulated ERVs. 

(D) Average binding profiles of the individual categories shows that ERVs belonging to the Chafla only and Chafl a+Trim28 categories are highly enriched with 
Kdmla and Hdac2 in comparison to the other categories. 

(E) Enrichment of H3K9me3, Kdmla, and Hdac2 in the genomic regions of the indicated categories. The reads in the heatmaps are clustered according to the 
enrichment profile of H3K9me3. 

(F) UCSC genome browser screenshots of representative repeat elements. RMER16-int bound by Chafla and Trim28 is highly enriched with H3K9me3. In 
contrast, ORR1B2 is bound by Chafla, Trim28, Hdac2, and Kdmla with very low H3K9me3 enrichment. Chafla, Hdac2, and Kdmla bind RLTR11B with the 
absence of Trim28 and H3K9me3, while LTRIS5 is bound exclusively by Trim28. 

See also Figure S6 and Table S5. 
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Figure 7. Chafl a Are Enriched at Proviruses and Regulates Their Expression through Its Interacting Epigenetic Co-factors 

(A) SILAC mass spectrometry (MS) analysis uncovers the Chafl a interactome network. Upper panel: schematic representation of the SILAC MS work-flow as 
described in the supplemental procedures. Lower panel: differential protein identification in Flag-tagged Chafl a immunoprecipitation. Several epigenetic and 
chromatin regulators are indicated. 

(B-D) Western blots confirm the interacting proteins identified by MS. Western blots showing co-immunoprecipitation (colP) of Chafl a with Eset, Kdmla, and 
Hdac2. 

(E) Chafl a is enriched at the proviral elements. Chafl a-3xHA ChIP was carried out in F9 Chafl a-3xHA cell line using a HA antibody. The enrichment was analyzed 
by qPCR. Data are presented as mean ± SEM from independent replicate experiments. 
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To understand the mechanisms by which Chafl a silences the 
newly introduced proviruses, we performed ChIP on the Chafl a 
interacting histone modifiers Kdml a and Hdac2. To our surprise, 
both Kdml a and Hdac2 were enriched at the proviral LTR (Fig- 
ures 7F and 7G). In addition, consistent with the siRNA screen, 
shRNA knockdown of Kdml a was able to rescue the expression 
of MMLV-G/p reporter (Figure S7H). Treatment of El 4 cells using 
the Hdac inhibitor TSA also relieved silencing of the MMLV-G/p 
reporter (Figure S7I). Next, we tested the dynamic changes of 
the histone marks on the provirus LTR and ERVs upon the deple- 
tion of Chafl a. The enrichment of H3K9me3 on provirus LTR was 
slightly reduced (Figure 7H), while the active H3K4me3 and total 
H3Ac marks were significantly increased (Figures 71, 7J, and 
S7J-S7L). Together, our data shows that the repressive function 
of Chafl a on proviruses is reinforced by the presence of its inter- 
acting partners, Kdml a, Hdac2, and Eset. 

To test whether Chafl a can directly bind the viral DNA, we per- 
formed electrophoretic mobility shift assays (EMSA). We did not 
observe a specific EMSA band for the Chafl a protein, indicating 
that Chafl a does not bind directly to the viral DNA (Figures S7M- 
S70). The Cafl complex is thought to assemble histones H3/H4 
during DNA replication and repair (Gaillard et al., 1996; Kaufman 
et al., 1995). Other studies have indicated that histone chaper- 
ones Asfla/b work synergistically with Cafl (Tyler et al., 1999). 
Our proteomics data also identified Asfla/b as components of 
the Chafl a interactome (Figure 7A), and the interaction between 
Chafl a and Asfla/b was confirmed by colP (Figure S7P). 
To further test the function of histone assembly on proviral 
silencing, we performed single and combinatorial shRNA knock- 
down of Asfla/b. Surprisingly, combinatorial depletion of Asfla/ 
b induced strong Gfp reactivation to a level comparable to that 
observed following Chafla depletion (Figure S7Q), indicating 
functional redundancy between Asfl a and Asfl b. This data sub- 
stantiates a possible role of histone assembly in the silencing of 
proviral elements and ERVs. 

DISCUSSION 

Mammalian genomes are cluttered with endogenous viral ele- 
ments, vestiges of the long history of coevolution with retrotrans- 
posons that have shaped the genome. Complex mechanisms 
have evolved to manage these elements, restricting their expres- 
sion and reactivation. Silencing of retroviruses also played a 
fortuitous role in the development of somatic cell reprogramming 
by transcription factors, as extinction of the reprogramming 
transgenes that occurs when fibroblasts revert to a pluripotent 
state is essential for the induced pluripotent stem cells to avoid 



oncogenic transformation and manifest their multi-lineage differ- 
entiation potential (Takahashi and Yamanaka, 2006). Our work 
provides insights into the role of the histone chaperone Chafla 
and sumoylation factor Sumo2 in the silencing of exogenous 
proviruses and ERVs. It supports a model where Chafl a promote 
the deposition of histone H3/H4, thus marking the integrated 
proviral DNA for silencing, helping to localize the Chafl a protein 
to the viral LTR region (Figure 7K). The binding and transcriptional 
repression of the proviral chromatin by Chafla is further rein- 
forced via the enzymatic activities of Chafl a-interacting proteins 
Eset, Kdml a, and Hdacl/2, which modify proviral chromatin 
with the repressive histone mark H3K9me3 and reduce the acqui- 
sition of activating H3K4me3 and H3Ac marks (Figure 7K). In par- 
allel, Sumo2 is required to play critical roles in the canonical 
Zfp809/Trim28/Eset complex via post-translational sumoylation 
of Trim28. Sumoylation enhances the recruitment of Trim28 to 
the proviral DNA, which in turn results in the modification of pro- 
viral chromatin with repressive histone H3K9me3 marks (Fig- 
ure 7K). Our unbiased screen for factors involved in proviral 
silencing has thus revealed a complex set of genetic and epige- 
netic mechanisms by which exogenous proviruses and ERVs 
are transcriptionally silenced in pluripotent stem cells. 

Cross-Talk between the Sumoylation Pathway and the 
Canonical Complex 

Among the Sumo2-related candidates, Senp6 deconjugates 
Sumo2 from targeted proteins (Mukhopadhyay and Dasso, 
2007), while the other factors are involved in covalent attachment 
of Sumo2 to the targeted proteins (Desterro et al., 1999; Geiss- 
Friedlander and Melchior, 2007; Gong et al., 1999; Hay, 2005; 
Johnson, 2004; Zhao, 2007). As such, it is tempting to speculate 
that the modification of key determinants by sumoylation or de- 
sumoylation may affect their capacity to silent the proviruses and 
ERVs. The cross-talk between chromatin modifying complex 
subunits (such asTrim28, Atf7ip, and Eset) and sumoylation fac- 
tors can be inferred from the overlap of target ERVs observed, as 
well as their close protein-protein interactions. Importantly, our 
study clarifies the mechanism by which Sumo2 targets the pro- 
viral elements and ERVs— through the sumoylation of Trim28. 
Furthermore, Sumo modification on other epigenetic factors 
may potentially help mediate heterochromatin formation. It will 
be of great interest to determine the proteome-wide set of su- 
moylated proteins in ESCs. 

Regulation of Different Classes of ERVs 

Our RNA-seq analysis indicates that Chafl a/b and sumoylation 
factors regulate different families of ERVs. Localization of Chafl a 



(F and G) Localization of Kdml a and Hdac2 on proviral DNA. ChIP was performed using antibodies against Kdml a or Hdac2 and the enrichment was tested by 
qPCR. 

(H-J) The perturbation of histone mark enrichment on proviral elements upon the depletion of Chafla in F9 cells. H3K9me3, H3K4me3, and H3Ac ChIP were 
performed on the samples upon depletion of Chafla. Data are presented as mean ± SEM from independent replicate experiments. 

(K) Schematic model for the silencing mechanism of the proviruses in mESCs involving Chafla, Sumo2, and the canonical Zfp809/Trim28/Eset pathway. Chafla 
and its upstream histone chaperones Asfla/b promote the deposition of histone H3/H4 to mark the integrated proviral DNA. Transcriptional repression of the 
proviral chromatin is reinforced by the enzymatic activities of Chafl a-interacting proteins, including the members of the NuRD complex (Kdml a, Hdacl/2) and 
Eset. This results in reduced acquisition of activating H3K4me3 and H3Ac marks. In parallel, Sumo2 sumoylates Trim28, which is necessary for recruiting Trim28 
onto the proviral DNA, in turn resulting in the deposition of the repressive H3K9me3 mark. 

See also Figure S7 and Table S6. 
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and Sumo2 at ERV loci was confirmed by ChIP-seq analysis. It 
is noteworthy that the pattern of the ERVs regulated by Chafla 
is distinct from that of the sumoylation machinery or chro- 
matin-modifying factors (Trim28, Eset, and Zfp809). Interest- 
ingly, Chafla regulates a significant number of ERVs from class 
III that are not marked with H3K9me3, but instead are enriched 
for H3K4me2 and H3K27Ac. Moreover, Chafla works with the 
enzymatic epigenetic modifiers, including Kdmla and Hdac2 
at these class III ERVs. In addition, Chafla also cooperates 
with Trim28 to repress the ERVs by reinforcing high levels of 
the H3K9me3 on class I and II ERVs. Thus, our study highlights 
how a chaperone like Chafla regulates different classes of 
ERVs through distinct interacting co-factors. 

Suppressive Function of Histone Chaperone Chafla/b 
on Newly Integrated Proviruses 

Cafl has been reported to have diverse functions, including 
epigenetic regulation, DNA damage repair, and DNA replication 
(Green and Almouzni, 2003; Kaufman et al., 1995; Poleshko 
et al., 2010; Shibahara and Stillman, 1999). More recently, 
Chafla was shown to be critical for maintaining the heterochro- 
matin state through its interaction with HP1, MBD1, and Eset 
(Murzina et al., 1999; Reese et al., 2003; Sarraf and Stancheva, 
2004). In fact, protein structure analysis of Chafla indicates a 
PXVXL pentapeptide motif at the N terminus, which allows 
Chafla to specifically interact with the HP1 chromo shadow 
domain (Thiru et al., 2004). Stable association of Chafla with 
HP1 proteins may lead to its retention in heterochromatin (Mur- 
zina et al., 1999). HP1 proteins are “readers” of repressive 
H3K9me3 marks and interact extensively with Eset. Intriguingly, 
our proteomics identified Eset, HPIa, HPIp, and HPIy among 
the Chafla interactome. Remarkably, only the knockdown of 
Chafla/b was capable of rescuing the viral reporter, but not 
the knockdown of Rbbp4 (Figures S7A and S7B). Previous 
studies suggest that Rbbp4 complexes with Chafla/b in G1 
phase. Notably, the epigenetic modification brought about by 
Chafla through HP1 or Cafl /Mbdl /Eset is S-phase-specific 
(Quivy et al., 2004; Sarraf and Stancheva, 2004). 

How does a histone chaperone like Chafla localize to the 
proviral LTR and ERVs? Previous work has localized histone 
chaperones such as Hira and Daxx to the genomic sites where 
histones are deposited (Banaszynski et al., 2013; Elsasser 
et al., 2012). A recent publication also described the role of his- 
tone variants H3.3 in regulating ERVs (Elsasser et al., 2015). 
Indeed, our Chafla ChIP-seq shows the enrichment of Chafla 
at the genomic sites of downstream ERV targets. When 
we knockdown the upstream histone chaperones of Chafla 
(Asfla/b), we observed the abolishment of the viral silencing 
effect of Chafla. Thus, we speculate that its nucleosome as- 
sembly function may play a role in localizing Chafla to the in- 
tegrated proviruses. 

In conclusion, our work reveals the genome-wide compen- 
dium of players that mediate proviral silencing in mouse ESCs. 
Multiple pathways and multi-layered machineries are employed 
by pluripotent embryonic stem cells to maintain the silencing of 
proviruses and ERVs. Further studies aimed at dissecting the 
intricate mechanisms by which the various factors act will help 
fill the remaining gap in our understanding of proviral repression. 



EXPERIMENTAL PROCEDURES 
Genome-wide siRNA Screen 

F9 cells were seeded at 6 x 10 5 /well in 6-well tissue culture plates. Twelve 
hours later, MMLV virus was added into the wells with 8 ng/ml polybrene 
(107689, Sigma). Eight hours later, F9 cells were trypsinized into single cells 
and seeded onto individual well of 384-well plates (REF 781091, Greiner) 
that were pre-printed with Mouse siGENOME SMARTpool library (G-015000, 
Thermo Scientific Dharmacon) and contain DharmaFECT 1 (Thermo Scienti- 
fic). Four days later, cells were fixed with 4% paraformaldehyde and cell nuclei 
were stained with Floechst 33342 (Invitrogen). Images were acquired using the 
ImageXpress Ultra Confocal High Content Screening System (Molecular De- 
vices). Gfp signal was quantified by the MetaXpress software (Molecular De- 
vices). Both the siRNA screens were carried out in duplicates. The average 
of the duplicate Gfp signal was calculated by normalizing to both positive 
and negative controls using ScreenSifter software (Kumar et al., 2013). A 
cut-off threshold was set at value >2 SD from mean of negative controls, above 
which siRNA of 650 candidate genes significantly increase Gfp expression 
level. Based on the secondary screening, 303 high-confidence hits with Gfp 
signal (CtrINorm value = (X - Avg(xcn))/(Avg(xcp) - Avg(xcn)) cut off above 
0.45 were selected. 

RNA-Sequencing 

Total RNA was extracted as described in the Supplemental Experimental Pro- 
cedures. DNA contamination was removed using a QIAGEN RNeasy Kit. 
The RNA samples were subject to mRNA selection, fragmentation, cDNA 
synthesis, and library preparation using a TruSeq RNA Sample Prep Kit (RS- 
122-2001, lllumina). Library quality was analyzed on a Bioanalyzer. High- 
throughput sequencing was performed on the Genome Analyzer llx (lllumina). 

ChIP and ChIP-Seq Assay 

Chromatin was prepared according to the methods provided in the Supple- 
mental Experimental Procedures. Chromatin extracts were immunoprecipi- 
tated using FI3K4me3 (Abeam), FI3Ac (Abeam), H3K9me3 (Abeam), Eset 
(Abeam), Trim28 (Bethyl), Sumo2 (Abeam), and HA (Santa Cruz) antibodies. 
Input and immunoprecipitation samples were analyzed by qPCR. All primers 
used are listed in Table S7. ChIP-seq libraries were prepared according to 
manufacturer’s instructions (lllumina). High-throughput sequencing was per- 
formed on a Genome Analyzer llx (lllumina). 

Bioinformatics Analysis 

See detailed information in the Supplemental Experimental Procedures. 

ACCESSION NUMBERS 

The accession number for all sequencing samples reported is GEO: GSE70865. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
seven figures, and seven tables and can be found with this article online at 
http://dx.doi.Org/1 0.101 6/j. cell. 201 5.08.037. 
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SUMMARY 

We present ChromATin, a quantitative high-resolu- 
tion imaging approach for investigating chromatin 
organization in complex tissues. This method com- 
bines analysis of epigenetic modifications by immu- 
nostaining, localization of specific DNA sequences 
by FISH, and high-resolution segregation of nuclear 
compartments using array tomography (AT) imaging. 
We then apply this approach to examine how the 
genome is organized in the mammalian brain using 
female Rett syndrome mice, which are a mosaic of 
normal and Mecp2 - null cells. Side-by-side com- 
parisons within the same field reveal distinct hetero- 
chromatin territories in wild-type neurons that are 
altered in Mecp2-nu\\ nuclei. Mutant neurons exhibit 
increased chromatin compaction and a striking 
redistribution of the H4K20me3 histone modification 
into pericentromeric heterochromatin, a territory 
occupied normally by MeCP2. These events are not 
observed in every neuronal cell type, highlighting 
ChromATin as a powerful in situ method for exam- 
ining cell-type-specific differences in chromatin 
architecture in complex tissues. 

INTRODUCTION 

The organization of chromatin within the nucleus plays an impor- 
tant role in the regulation of gene expression (Bickmore and van 

Steensel, 2013; Politz et al., 2013). Although high-throughput 
sequencing strategies have revolutionized chromatin research 
by enabling genome-wide analysis of chromatin interactions 
(Dixon et al., 2012; Lieberman-Aiden et al., 2009), fluorescent 
in situ hybridization (FISH) remains a powerful tool in studying 
the organization of chromosomal territories (Cremer and Cremer, 
2010). New high-resolution imaging technologies promise to 
advance our understanding of how chromatin is packaged in 
the nucleus for appropriate gene expression (Ricci et al., 2015; 
Smeets et al., 2014). 

New methods for examining chromatin architecture are 
needed. The two most widely used strategies, chromosome 
conformation capture (C-method) and FISH, each have their 
own strengths and weaknesses. Although C-methods offer 
base pair resolution and, in the case of HiC, genome-wide anal- 
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ysis of chromatin, this method is most often performed on 
pooled cell populations, which might obscure cell-type-specific 
differences that exist in complex tissues. On the other hand, 
FISH is an ideal method for analysis of different cell types in tis- 
sue, but probes are typically limited to a small number of genetic 
loci. Interestingly, these methods are not always in agreement 
with regard to chromatin organization. For example, analysis of 
the HoxD locus in mutant embryonic stem cells shows an open 
chromatin structure using FISH and a closed structure using 
5C (Williamson et al., 2014). A potential source of these differ- 
ences is that C-methods may involve fixation of relatively large, 
cross-linked chromatin domains, detecting cytological co-local- 
ization rather than direct molecular interaction (Belmont, 2014; 
Gavrilov et al., 2013). Also, to associate C-method and FISH re- 
sults with chromatin modifications requires a separate analysis 
using different experimental conditions. For these reasons, we 
sought to develop a quantitative, high-resolution imaging 
method for investigating chromatin organization in complex tis- 
sues. This method would combine analysis of epigenetic modifi- 
cations by immunostaining, localization of specific DNA 
sequences by FISH, and high-resolution segregation of nuclear 
compartments using an advanced imaging technique. We have 
adapted the array tomography (AT) imaging method for this 
purpose. 

AT is a high-resolution imaging method developed for the 
reconstruction and analysis of neuronal circuitry in the brain (Mi- 
cheva and Smith, 2007). The enhanced resolution is achieved by 
generating ultrathin serial sections of the specimen, followed by 
image acquisition and alignment. Acrylic sections can be strip- 
ped repeatedly, allowing for multiple rounds of imaging. This 
multiplexed staining approach increases the amount of molecu- 
lar information that can be derived from a tissue volume (Micheva 
et al., 2010). FISH methods have not been reported for AT, and 
developing this capability would increase the utility of the 
approach for localizing DNA sequences or expressed RNAs. 
Our motivation in developing this method was to gain a deeper 
understanding of how the genome is organized in the mamma- 
lian brain, a tissue with an extreme variety of cell types. To this 
end, we tested AT for examining neuronal chromatin in mice 
lacking the DNA binding protein, MeCP2. Mutations in MECP2 
give rise to the neurological disorder, Rett syndrome (RTT) 
(Amir et al., 1999). MeCP2 is expressed to high levels in neurons 
and binds globally to methyl- and hydroxymethyl-cytosine within 
different dinucleotide contexts (Guo et al., 2014; Lewis et al., 
1992; Mellen et al., 2012). Mecp2 is an X-linked gene (Quaderi 
et al., 1994), and cells in female RTT patients and mouse models 
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Figure 1. Quantitative Analysis of Chro- 
matin Architecture in Hippocampal Pyrami- 
dal Neurons 

(A) DAPI-stained nuclei in a 200 nm hippocampal 
section from a symptomatic female RTT mouse 
{Mecp2-EGFP B/+ ). The image was stitched from 
multiple fields acquired using a 25x objective. 
Scale bar, 100 |xm. The white rectangle encloses 
the portion of the CA1 pyramidal cell layer used for 
high-resolution imaging. 

(B) Volume rendering of 59 serial sections (200 nm 
thick) through the hippocampal pyramidal cell 
layer. DAPI labels nuclei, and an antibody to GFP 
was used to identify cells expressing MeCP2-GFP. 
Note mosaic expression of Mecp2-EGFP gene 
due to random Xi. Scale bar, 10 [xm. 

(C-F) Nucleus from a WT pyramidal neuron. 

(C) Volume rendering of the nucleus viewed along 
the x-y axis. Scale bar, 1 [xm. (D) Nucleus viewed 
along the y-z axis. (E) 3D surfaces (red) used to 
isolate heterochromatin for quantification. (F) A 
single 3D surface (blue) encloses the entire con- 
tents of the nucleus. 

(G-l) Analysis of WT and Mecp2-null pyramidal 
neurons. 

(G) Scatterplot showing the percentage of total 
nuclear DAPI pixel intensity located within the 
heterochromatin threshold (mean ± SD). (WT, 
22.5 ± 2.5%, n = 51 nuclei from three mice), (null, 
27.1 ± 2.2%, n = 55 nuclei from three mice). Un- 
paired t test, p < 0.0001 . Xi chromosome values 
were subtracted from total heterochromatin. 

(FI) Scatterplot showing the density of DAPI pixel 
intensity within the heterochromatin threshold 
2.9. Unpaired t test, p < 0.0001 . 

Unpaired t test, p < 0.0001 . 



are mosaic for loss of MeCP2 due to dosage compensation in 
mammals (Adler et al., 1995). This mosaicism provides an ideal 
experimental context wherein neurons with normal chromatin ar- 
chitecture are adjacent to Mecp2 - null neurons. While in vitro 
studies suggest that MeCP2 may regulate higher-order chro- 
matin structure, it is not known how these findings impact chro- 
matin organization in vivo. Further, several additional models for 
its function, including gene repression and activation, have also 
been proposed (Lyst and Bird, 2015). 

Using AT imaging, we quantify large-scale chromatin changes 
in symptomatic adult female RTT mouse brain. We detect a sig- 
nificant increase in chromatin compaction in two types of 
Mecp2 - null hippocampal neurons together with a striking redis- 
tribution of the H4K20me3 histone modification into pericentro- 
meric heterochromatin. In contrast, we do not detect these 
changes in cerebellar granule cells. We observe a spectrum of 
chromatin condensation states among cells in the nervous sys- 
tem providing a potential mechanism to explain cell-type-spe- 
cific differences in gene expression upon loss of MeCP2. 

In summary, we show that AT is an ideal tool for investigating 
chromatin architecture in complex tissues where cellular hetero- 
geneity may confound methods that sample populations of cells. 
Multiplexed detection of epigenetic modifications and genomic 
sequences combines with the resolving power of the AT imaging 



method to permit quantitative analysis within defined nuclear 
compartments. 

RESULTS 

Quantitative Analysis of Chromatin Organization 
in Neurons Using AT 

We initially chose hippocampal CA1 pyramidal neurons for our 
investigation due to their location within a well-defined layer in 
the hippocampus (Figure 1A). Neuronal nuclei, as in other cell 
types, can be visualized using the fluorescent DNA intercalator, 
DAPI (Wilson et al., 1990). To determine the utility of AT for 
analyzing chromatin architecture, we exploited the mosaic 
nature of female RTT mouse brain expressing a knockin Mecp2- 
EGFP gene fusion (Lyst et al. , 201 3). Mosaicism is ideal for imaging 
comparisons because fixation, embedding, staining, and imaging 
steps are equivalent for the wild-type (WT) and mutant populations 
of neurons under investigation. As predicted, due to the X-linked 
nature of Mecp2 and random X inactivation, the ratio of WT 
(GFP positive) to mutant cells was approximately 1 :1 (Figure 1 B). 

We found that 200 nm sections allowed for full-volume recon- 
struction of an sufficient number of nuclei for analysis. Figure 1 C 
shows 3D reconstruction for a WT nucleus visualized along 
the x-y axis, while Figure 1 D shows equivalent resolution along 
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Figure 2. Multiplexed Immunostaining for Histone Modifications 

(A-l) Panels (A)-(l) represent fluorescence images acquired using the same 
200 nm section through the hippocampal pyramidal cell layer. 

(A) DAPI staining of CA1 pyramidal neuron nuclei. Right, two WT nuclei (see I). 
Left, two mutant nuclei. Scale bar, 2 [im. 

(B) Binding of the lectin Concanavalin A (ConA) to membrane glycoproteins. 

(C) Merge of ConA (red) with DAPI (green). 

(D) Immunostaining for H3K9me2. 

(E) Immunostaining for H3K9me3. 

(F) Immunostaining for H3K27me3. The two intense H3K27me3 clusters 
represent the Xi. These clusters are not visible in the two lower nuclei because 
this image is of a single 200 nm section. 

(G) Immunostaining for H4K20me2. 

(H) Immunostaining for H4K20me3. 

(I) Immunostaining for the MeCP2-GFP fusion protein. 

(J) DAPI staining of a nucleus from a WT neuron. Panels (J)-(L) represent 
fluorescence images acquired from the same 200 nm section from another 
mouse. Scale bar, 1 ^im. 

(K) Heatmap representation of DAPI pixel intensity for the section shown in 
Figure 2J. The two adjacent heterochromatin clusters exhibit different density 
profiles. Bar shows heatmap index. 



the z axis. Heterochromatic foci in neuronal nuclei exhibit a 
wide range of sizes and numbers. We therefore established a 
threshold based upon pixel intensity to quantify heterochromatin 
intensity and volume. A representative surface rendering of the 
heterochromatic foci enclosed by our threshold is shown in Fig- 
ure IE. A second 3D surface enclosing the entire nucleus was 
used to determine total nuclear volume and DAPI pixel intensity 
(Figure 1 F). The structures enclosed by these surfaces are best 
visualized in Movie SI. 

Using AT, we found that the total amount of nuclear DAPI did 
not differ between WT and Mecp2-nu\\ neurons (Figure SI A). 
However, loss of MeCP2 resulted in a 20% increase in the 
amount of DNA packaged within the heterochromatin threshold 
(Figure 1 G). The increase in chromatin compaction was detected 
as both an increase in DAPI density within heterochromatin (Fig- 
ure 1 H), as well as an increase in heterochromatin volume (Fig- 
ure 1 1). These results indicate that, in Mecp2 - null neurons, there 
is a redistribution of DAPI-labeled DNA into more densely 
compact heterochromatin, and the increase in volume is not 
due to “unraveling” of chromatin within heterochromatic foci. 

Previous studies found that nuclear diameters in RTT neurons 
were smaller than WT neurons (Li et al., 201 3; Stuss et al., 201 3; 
Yazdani et al., 2012). In agreement with these reports, AT anal- 
ysis of nuclear volume revealed a modest (~5%) decrease in 
mutant neurons compared to WT (Figure SIB). Interestingly, 
we found a strong negative correlation when we plotted hetero- 
chromatin content versus nuclear volume for CA1 pyramidal 
neurons regardless of MeCP2 status (Figure SI C). This is consis- 
tent with previous reports that show an expansion of nuclear size 
when chromatin is decondensed (Mazumder et al., 2008; Shen 
et al., 1995). 

Resolving Spatial Organization of Histone Modifications 
with Multiplexed Immunostaining 

The other notable advantage of AT over other imaging methods 
is the ability to perform multiple rounds of imaging on the same 
sections using a variety of detection reagents. The observation 
of more compact chromatin in Mecp2 - null neurons led us to 
examine whether we could detect any changes in heterochro- 
matin-associated histone modifications. In Figure 2, we show 
the results of multiplexed immunostaining using antibodies 
against five different histone modifications, as well as for DAPI, 
MeCP2-GFP, and Conconavalin A (ConA). Immunostaining for 
H3K9me2, H3K9me3, H3K27me3, and H4K20me3 shows ex- 
pected patterns in terms of their association with DAPI-labeled 
heterochromatin, indicating that our cytological criterion for het- 
erochromatin is consistent with previously published studies 
(Figures 2D-2H). 



(L) Merged immunostaining for MeCP2-GFP (green) and H3K27me3 (red). 
MeCP2-GFP is enriched in the heterochromatin cluster with higher DAPI pixel 
intensity in Figure 2K, while H3K27me3 enriched heterochromatin has lower 
DAPI pixel intensity. 

(M) Scatterplot showing Xi DAPI pixel density (mean ± SD). Density is ex- 
pressed as pixel intensity (x10 3 ) per i^m 3 . WT, 42.0 ± 2.5, n = 51; null, 43.0 ± 
3.3, n = 55. Unpaired t test, p = 0.078. 

(N) Scatterplot showing the percentage of MeCP2-GFP (mean ± SD) localized 
within the heterochromatin threshold in WT neurons. 30.4 ± 3.3%, n = 51. 
See also Movie S2. 
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Figure 3. The H4K20me3 Modification Is Redistributed Spatially 
upon Loss of MeCP2 

(A-F) Panels (A)-(F) represent fluorescence images acquired using the same 
200 nm section through the hippocampal pyramidal cell layer. 

(A) DAPI staining. Scale bar, 1 ^im. 

(B) Relative pixel intensity of DAPI staining. Heat map index is located in the 
lower right corner. Arrow indicates lower intensity heterochromatin still within 
threshold value. Arrowhead points to high intensity heterochromatin. 

(C) Immunostaining for MeCP2-GFP. Left, WT nucleus; right, Mecp2- null nucleus. 

(D) Immunostaining for H4K20me3. Arrows and arrowhead as in Figure 3B. 

(E) Immunostaining for H4K20me2. 

(F) Merged image for MeCP2-GFP and H4K20me3. In the WT nucleus, 
MeCP2-GFP binding dominates in pericentromeric heterochromatin (arrow- 
head), while H4K20me3 localizes to the heterochromatin region with lower 
DAPI density (arrow). 

(G) Scatterplot showing percentage of total nuclear H4K20me3 within het- 
erochromatin in pyramidal neurons (mean ± SD). There is a significant redis- 
tribution of H4K20me3 into heterochromatin after loss of MeCP2 (WT, 18.5 ± 
3.0, n = 36 nuclei from three mice; null, 30.5 ± 3.5, n = 41 nuclei from three 
mice). Unpaired t test, p < 0.0001 . 

(H) Scatterplot showing the percentage of total nuclear H4K20me2 within hetero- 
chromatin (mean ± SD) for pyramidal neurons. WT, 10.8 ± 2.1 , n = 27 nuclei from 
two mice; null, 1 1 .0 ± 2.2, n = 31 nuclei from two mice. Unpaired t test, p = 0.72. 
See also Figures S2 and S3 and Movie S3. 



We always observed a single intense cluster of H3K27me3 in 
female cells that represents the Xi (Figure 2F), and MeCP2- 
GFP was generally excluded from this heterochromatin 
compartment (Figures 2J-2L). These exclusive heterochromatin 
territories are best appreciated by viewing the full 3D reconstruc- 
tion (Movie S2). Using H3K27me3 immunostaining as a guide, 
we isolated this compartment (green surface in Movie SI) from 
total heterochromatin. Consistent with the observation that 
MeCP2 is excluded from this territory, we did not observe a sig- 
nificant difference in DAPI density for the Xi chromosome be- 
tween WT and mutant neurons (Figure 2M). Further, the density 
of DAPI within the Xi was less than the density within other het- 
erochromatin compartments (~80%, compare with Figure 1H). 
This less condensed heterochromatin state for Xi has been 
described previously by light and electron microscopy (Rego 
et al., 2008). 

In addition to quantifying DAPI intensity, the established het- 
erochromatin threshold can be applied to all the imaging chan- 
nels acquired in an experiment. For example, although it appears 
upon visual examination that most of the MeCP2 is localized to 
heterochromatic foci, low-intensity clusters are highly abundant 
in euchromatin and nucleoplasm, resulting in heterochromatin 
MeCP2 content being ~30% of total MeCP2 in a pyramidal 
cell nucleus (Figure 2N). 

The H4K20me3 Modification Defines a Unique 
Heterochromatin Territory 

AT imaging revealed an unexpected alteration in the spatial orga- 
nization of H4K20me3 when WT and Mecp2-nu\\ nuclei were 
compared. In neurons deficient for MeCP2, H4K20me3 dis- 
played a staining pattern that was almost identical to that of 
DAPI (compare Figures 3A-3D, right nuclei), which is consistent 
with previous reports that H4K20me3 is associated with pericen- 
tromeric heterochromatin (Schotta et al., 2004). Strikingly, this 
extensive overlap with heterochromatic foci was not observed 
in WT neurons (Figures 3A-3D, left nuclei). In WT neurons, 
MeCP2 clustered intensely in heterochromatic foci, while the 
H4K20me3 intensity was reduced in these regions (arrowheads, 
compare Figures 3D and 3F left nuclei), instead occupying a ter- 
ritory peripheral to the dense heterochromatic foci (arrows, Fig- 
ures 3D and 3F). These H4K20me3-rich territories are well within 
the heterochromatin threshold, but they display reduced DAPI 
intensity when compared to pericentromeric heterochromatin 
(arrow, Figure 3B). The segregation of H4K20me3, H3K27me3, 
and MeCP2 enriched heterochromatin territories is easily 
observed using AT (Movie S3). We performed confocal micro- 
scopy on brain sections to confirm that the H4K20me3 territories 
were not an artifact of the AT procedure. Examination of CA1 py- 
ramidal neurons from female heterozygotes or wild-type (WT) 
male mice shows that the H4K20me3 territories can be detected 
using standard microscopy methods albeit at lower resolution 
(Figure S2). 

We used the heterochromatin threshold to quantify the extent 
of this altered localization and detected a robust (65%) redistri- 
bution of H4K20me3 into dense heterochromatin upon loss of 
MeCP2 (Figure 3G). This spatial redistribution occurred in 
conjunction with a modest (1 1 %) increase in the total nuclear 
amount of H4K20me3 (see Figure 4F). In contrast to 
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Figure 4. The H4K20me3 Modification Expands into Pericentromeric 
Heterochromatin in Mecp2-Null Nuclei 

(A-D) Panels (A)-(D) represent fluorescence images taken from a volume 
rendering through the hippocampal pyramidal cell layer. 

(A) DAPI staining. Scale bar, 1 urn. 

(B) Immunostaining for MeCP2-GFP identifies left neuron as WT and right 
neuron as Mecp2 null. 

(C) Immunostaining for H4K20me3. 

(D) Merged image showing H4K20me3 distribution (red) relative to major 
satellite FISH signal (green). In the WT neuron, H4K20me3-enriched regions 
are peripheral to the major satellite heterochromatin territory. 

(E) Scatterplot showing the percentage of total nuclear H4K20me3 within the 
major satellite threshold (mean ± SD). There is a significant redistribution of 
H4K20me3 into pericentromeric heterochromatin after loss of MeCP2 (WT, 
1 1 .8 ± 1 .8, n = 1 9 nuclei from 3 mice; null, 23.7 ± 4.4, n = 1 9 nuclei from three 
mice). Unpaired t test, p < 0.0001 . 

(F) The relative amount of H4K20me3 within the nucleus (mean ± SD) is 
compared between WT and null neurons. Intensity units represent total inte- 
grated intensity (x10 6 ) with mean intensity normalized for WT pyramidal 
neurons. (WT, 1 .00 ± 0.01 , n = 46 nuclei from three mice; null, 1 .1 1 ± 0.02, n = 
50 nuclei from three mice. Unpaired t test, p < 0.0001 . 

See also Movie S4. 



H4K20me3, we did not observe a significant change in the 
spatial organization of H4K20me2 with loss of MeCP2 (Fig- 
ure 3H), indicating specificity in spatial organization of different 
histone marks in mutant nuclei. 

We also examined the spatial distribution of H3K9me3, 
another heterochromatin-associated histone modification. Un- 
like the case for H4K20me3, no heterochromatin territories 
were enriched exclusively for this modification. In fact, 
H3K9me3 appeared to be present throughout all of the hetero- 
chromatin territories that we observed. Although it was difficult 
to detect by eye, AT analysis showed that there was still a signif- 



icant (26%) redistribution of H3K9me3 into densely packed het- 
erochromatin in Mecp2 - null nuclei (Figure S3A). This occurred in 
conjunction with a 1 0% increase in total nuclear H3K9me3 levels 
(Figure S3C). We also detected a slight redistribution of 
H3K9me2 into dense heterochromatin (Figure S3B). 

AT FISH Shows Expansion of the H4K20me3 
Modification into Pericentromeric Heterochromatin in 
Mecp2-Null Nuclei 

To map chromosomal territories more precisely in the nucleus, we 
used peptide nucleic acid probes specific for both major and mi- 
nor satellite sequences to localize pericentromeric and centro- 
meric heterochromatin, respectively (Movie S4). We performed 
one round of immunostaining followed by FISH on the same hippo- 
campal sections to determine the relative positions of H4K20me3, 
MeCP2-GFP, and the major and minor satellite sequences. As 
expected, the major satellite probe localized with the most 
condensed heterochromatic foci (Figures 4A and 4D). As pre- 
dicted on the basis of our DAPI staining, the H4K20me3-rich 
territory was adjacent to the major satellite territory in WT nuclei 
(Figure 4D, left nuclei). In neurons lacking MeCP2, this segregation 
was lost, and H4K20me3 was enriched significantly in the major 
satellite territory (Figure 4D, right nuclei). We created a 3D 
threshold surrounding the major satellite FISH signal and 
measured the intensity of H4K20me3 immunostaining within the 
major satellite zone relative to the total nuclear intensity. We de- 
tected a robust (100%) redistribution of H4K20me3 into pericen- 
tromeric heterochromatin upon loss of MeCP2 (Figure 4E). This 
spatial reorganization occurred in conjunction with a modest in- 
crease in the total nuclear amount of the modification (Figure 4F). 

Changes in Chromatin Architecture upon Loss 
of MeCP2 Are Cell-Type Specific 

To determine if the changes in chromatin architecture we 
observed for CA1 pyramidal neurons were occurring in other 
neuronal cell types, we performed a similar analysis for hippo- 
campal dentate granule cells, as well as cerebellar granule cells. 
We performed two rounds of antibody staining to collect infor- 
mation for both H4K20me3 and H3K9me3 modifications, fol- 
lowed by FISH for major satellite sequences. MeCP2-GFP is 
abundant in dentate granule cell nuclei, with the typical strong 
localization in heterochromatic foci (Figure 5B). Similar to CA1 
pyramidal neurons, we find that loss of MeCP2 results in a signif- 
icant redistribution (93%) of H4K20me3 into the major satellite 
territory (Figures 5D and 5E). Analysis for H3K9me3 showed 
redistribution to a lesser extent (37%, Figures S4E and S4G), 
while H3K9me2 levels show a slight decrease in the major satel- 
lite territory (Figures S4F and S4H). The changes in H4K20me3 
and H3K9me3 in dentate granule cells lacking MeCP2 are 
accompanied by a 15% increase in DAPI-labeled DNA pack- 
aged into densely packed heterochromatin (Figure 5F). 

We next imaged chromatin organization in WT and Mecp2 - null 
cerebellar granule cells, which have smaller nuclei and abundant 
heterochromatin. While we could easily identify MeCP2-positive 
granule cells (Figure 5H), the expression level of MeCP2-GFP 
and intensity in the major satellite territory was lower in these 
neurons compared to other neuronal cell types. We could not 
detect a change in the spatial distribution of H4K20me3 in 
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Figure 5. H4K20me3 Redistribution in Mecp2-Null Nuclei Is a Cell-Type-Specific Event 

(A-D) Panels (A)-(D) represent fluorescence images taken from a volume rendering through the suprapyramidal dentate granule cell layer. 

(A) DAPI staining. Scale bar, 1 ^im. 

(B) Immunostaining for MeCP2-GFP identifies left neuron as WT and right neuron as Mecp2- null. 

(C) Immunostaining for H4K20me3. 

(D) Merged image showing H4K20me3 distribution (red) relative to major satellite FISH signal (green). 

(E) Scatterplot showing the percentage of total nuclear H4K20me3 within the major satellite threshold (mean ± SD). There is a significant redistribution of 
H4K20me3 into pericentromeric heterochromatin after loss of MeCP2 (WT, 1 4.0 ± 1 .8, n = 1 5 nuclei from two mice; null, 27.0 ± 2.1 , n = 1 5 nuclei from two mice). 
Unpaired t test, p < 0.0001 . 

(F) Scatterplot showing the percentage of total nuclear DAPI pixel intensity located within the heterochromatin threshold (mean ± SD). WT, 24.7 ± 2.7; null, 28.5 ± 
1 .8. Unpaired t test, p < 0.0001 . Panels (G)-(J) represent fluorescence images taken from a volume rendering through the cerebellar granule cell layer. 

(G) DAPI staining. Scale bar, 1 ^im. 

(H) Immunostaining for MeCP2-GFP distinguishes between WT and Mecp2- null granule cells. 

(I) Immunostaining for H4K20me3. 

(J) Merged image showing H4K20me3 distribution (red) relative to major satellite FISH signal (green). 

(K) Scatterplot showing the percentage of total nuclear H4K20me3 within the major satellite threshold (mean ± SD). WT, 30.7 ± 2.1 , n = 1 7 nuclei from two mice; 
null, 31 .2 ± 1 .8, n = 1 7 nuclei from two mice. Unpaired t test, p = 0.4075. 

(L) Scatterplot showing the percentage of total nuclear DAPI pixel intensity located within the heterochromatin threshold (mean ± SD). WT, 56.2 ± 2.5; null, 56.9 ± 
3.0. Unpaired t test, p = 0.4526. 

See also Figures S4 and S5 and Movie S5. 



cerebellar granule cells (Figure 5K). Consistent with their visible 
heterochromatic abundance, the amount of DAPI within hetero- 
chromatin was much greater for these neurons than for the hip- 
pocampal cell types we examined (Figure 5L). This was not a 
consequence of the brain region per se because neighboring 
cell types in the cerebellum expressed much higher levels of 
MeCP2-GFP than granule cells and exhibited the alterations in 
H4K20me3 organization that we observed for Mecp2 - null hip- 
pocampal neurons. For example, cerebellar Purkinje neurons 
have extremely decondensed chromatin, and major satellite 
repeats are typically packaged in a massive heterochromatin 
cluster near the nucleolus (Figure S5A). In WT Purkinje neurons, 
MeCP2-GFP clusters intensely in the major satellite territory, and 



H4K20me3 is predominantly localized to an adjacent hetero- 
chromatin territory (Figure S5B). In a Purkinje neuron that lacks 
MeCP2, H4K20me3 spreads into the major satellite territory (Fig- 
ure S5C and Movie S5). Due to the large size of Purkinje nuclei, 
we have not acquired enough data from full Purkinje nuclei to 
quantify this change. 

AT Analysis of Transcriptional Activity 

We used antibodies against phosphorylated Serine5 on the car- 
boxy terminal domain (CTD) of RNA polymerase II (RNAPII) to 
quantify transcriptional activity within single nuclei (Egloff and Mur- 
phy, 2008). As predicted, the signal for this antibody was excluded 
from highly condensed heterochromatin (Figures 6A-6C). Further, 
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Figure 6. AT Analysis of Transcriptional Activity 

(A-C) Panels (A)-(C) represent fluorescence images acquired using the same 
200 nm section through a pyramidal neuron. 

(A) DAPI staining. Scale bar, 1 [im. 

(B) Immunostaining with antibodies directed against the phosphorylated CTD 
(Ser5) of RNA polymerase II (RNAPII Ser5-P) to detect the active polymerase. 

(C) Merged image of DAPI and RNAPII Ser5-P. Note that RNAPII Ser5-P is 
excluded from heterochromatic foci. 

(D) Total integrated pixel intensity (x10 5 ) for RNAPII Ser5-P from WT nuclei 
was plotted versus heterochromatin content. RNAPII Ser5-P levels are 
negatively correlated with increasing heterochromatin content (n = 51 nuclei 
from three mice, Pearson r = -0.72, p < 0.0001). 

(E) Scatterplot comparing RNAPII Ser5-P levels in WT and Mecp2-nu\\ pyra- 
midal neurons. Intensity units represent total integrated pixel intensity (x10 5 ) 
with mean intensity normalized for WT pyramidal neurons. WT, 27.0 ± 5.0, 
n = 51 nuclei from 3 mice; null, 24.5 ± 5.2, n = 55 nuclei from three mice). 
Unpaired t test, p = 0.013. 



analysis of the reconstructed nuclei revealed a strong negative 
correlation between Phospho Ser5 CTD levels and nuclear hetero- 
chromatin content (Figure 6D). We also compared total nuclear 
levels of Phospho Ser5 CTD between WT and mutant neurons in 
female RTT hippocampus. This analysis showed a slight but signif- 
icant reduction in Phospho Ser5 CTD in mutant neurons (Fig- 
ure 6E). Antibodies against the N terminus of RNAPII did not 
work in AT, so we could not determine whether the reduction in 
Phospho Ser5 CTD was a result of lower total levels of RNAPII or 
is indeed a reflection of reduced transcriptional initiation in 
Mecp2 - null nuclei. These experiments show that, upon the devel- 
opment of improved reagents, AT imaging can be effectively used 
to quantify transcriptional activity in different cell types. 

DISCUSSION 

Here, we describe an adaptation and expansion of AT for the 
investigation of chromatin organization. The advantages that 



make AT a useful tool for resolving synaptic junctions in the brain 
also make it an ideal tool for resolving nuclear compartments. Ul- 
trathin sectioning improves resolution along the z axis while also 
eliminating signal degradation due to poor depth penetrance of 
staining reagents (Micheva and Smith, 2007). We have gener- 
ated high-resolution, three-dimensional reconstructions of 
nuclei to show how AT can be used to quantify a variety of pa- 
rameters of nucleus and chromatin structure. 

A powerful feature of AT is the ability to acquire multiple rounds 
of imaging information. This provides a way to survey the molec- 
ular composition of specific compartments within the nucleus. 
We performed up to five rounds of imaging on one set of serial 
sections, but up to nine rounds have been reported with AT (Mi- 
cheva and Smith, 2007). Multiplexed immunostaining permits a 
candidate screening approach in addition to targeted studies. 
In our case, we did not initiate the experiments with a predeter- 
mined hypothesis about the spatial organization of H4K20me3 
and how loss of MeCP2 may alter this organization. We per- 
formed an initial screen of antibodies for histone modifications, 
and the results led us to focus on modifications that mark distinct 
heterochromatin territories. For example, H4K20me3- and 
H3K27me3-rich regions in WT neurons were adjacent to, but 
clearly segregated from, MeCP2-bound heterochromatin (Movie 
S3). While we were able to detect the segregated distribution of 
H4K20me3 in CA1 pyramidal neurons by conventional confocal 
microscopy (Figure S2), AT imaging provided much greater reso- 
lution of these structures. The importance of examining chro- 
matin organization in the brain is underscored by our finding 
that dissociated hippocampal neurons did not show distinct 
H4K20me3 territories adjacent to pericentromeric heterochro- 
matin in culture (J. Sinnamon, personal communication). 

By developing FISH conditions for AT, we have added another 
dimension to the method that considerably broadens its applica- 
tions. There has recently been great interest in relating spatial or- 
ganization of genes with chromatin modifications by combining 
HiC analysis with ChIP (Dixon et al., 2012; Rao et al., 2014). 
The development of an in situ detection method for chromatin or- 
ganization fills a need for investigators who work in systems with 
extreme cellular diversity where anatomy is an integral part of the 
information desired. 

We challenged the AT imaging method by using it on brains 
from symptomatic RTT female mice at 4-6 months of age. Not 
only does the mammalian brain contain a staggering variety of 
cell types, but the mosaic expression of MeCP2 provided a 
tightly controlled experimental system for comparing WT and 
Mecp2-r\uU cells. Using these methods, AT has provided in vivo 
evidence supporting a role for MeCP2 in regulating chromatin ar- 
chitecture in neurons. We found that chromatin becomes more 
densely packed upon loss of MeCP2. Along with the compac- 
tion, we see a dramatic redistribution of H4K20me3 into pericen- 
tromeric heterochromatin. Consistent with the finding of more 
condensed chromatin, we observed a slight reduction of active 
RNAPII in mutant nuclei. The ability to assess large-scale chro- 
matin organization, histone modifications, and transcriptional 
activity within the same nucleus in its native environment high- 
lights the usefulness of AT as tool for chromatin research. 

What could explain the increase in chromatin compaction 
when MeCP2 is absent in neurons? The redistribution of 
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H4K20me3 into pericentromeric chromatin accompanied the 
increased DNA compaction in Mecp2 - null neurons. Given this 
result, it is of note that nucleosomal arrays reconstituted with 
H4K20me3 modified histones form locally condensed, oligo- 
meric structures (Lu et al., 2008). Another potential mechanism 
is the interplay between MeCP2 and other chromatin architec- 
tural proteins, such as histone HI , which may promote higher-or- 
der aggregation of chromatin fibers when MeCP2 is absent. 
Previous studies have shown that MeCP2 competes with HI 
for binding to nucleosomal DNA (Ghosh et al., 2010; Nan et al., 
1 997), and levels of HI increase in the absence of MeCP2 (Skene 
et al., 2010). The presence of other chromatin architectural pro- 
teins in nuclei may explain why our results appear to be in oppo- 
sition to in vitro studies that show nucleosome compaction in the 
presence of purified MeCP2 (Baker et al., 2013; Georgel et al., 
2003). Structural studies need to be performed to compare 
how chromatin fibers with repressive histone modifications are 
packaged when bound by MeCP2 or HI. Finally, we cannot 
exclude the possibility that the chromatin changes are indirect 
and downstream effects of gene expression. 

We observed similar alterations in chromatin for both hippo- 
campal CA1 pyramidal neurons and dentate granule cells, but 
we did not detect changes for cerebellar granule cells. This dif- 
ference between cell types is consistent with studies showing 
that loss of MeCP2 results in different gene expression profiles 
depending on neuronal type (Mellen et al., 2012; Sugino et al., 
2014). It has also been shown that neurons differ with respect 
to levels of MeCP2 expression (Chao et al., 2010). Our general 
observation is that MeCP2 expression scales with nuclear size, 
and cerebellar granule cells express the lowest amount of 
MeCP2 of the cell types examined in this study (data not shown). 
The lack of changes in H4K20me3 distribution in cerebellar 
granule cells agrees with results from a previous study that did 
not find differences in histone modifications in Mecp2-r\uU retinal 
neurons (Song et al., 2014). These results stress the importance 
of cell sorting or in situ approaches when studying gene expres- 
sion in the brain. 

Taken together with previous studies, our AT study suggests 
two levels of MeCP2 regulation of gene expression programs. 
At one level, MeCP2 prevents higher-order chromatin aggrega- 
tion either by preventing H4K20me3 modification or by out- 
competing HI binding at methylated nucleosomal linker DNA. 
At another level, MeCP2 functions as a transcriptional regulator, 
either through recruitment of co-repressors (Lyst et al., 2013; 
Nan et al., 1998) or via recruitment of co-activators (Chahrour 
et al., 2008). This second function may be more subject to regu- 
lation by signaling pathways, with the phosphorylation state of 
MeCP2 dynamically regulating association with its co-factors 
(Ebert et al., 201 3; Lyst et al., 201 3). This dual functionality would 
allow MeCP2 to maintain a segregated chromatin fiber that al- 
lows for long-lived neurons to adapt their gene expression pro- 
grams to events, including synaptic activity or injury. In the 
absence of MeCP2, more compact packaging of these se- 
quences may restrict such flexibility, and this would agree with 
a previous model suggesting that MeCP2 functions as a facili- 
tator of transcriptional activation (Mellen et al., 2012). 

The increase in DNA compaction in mutant nuclei raises the 
possibility that genes required for neuronal function may become 



inappropriately relocated to a more repressive compartment and 
thus less subject to regulation by signaling pathways. In future 
studies, AT imaging can be combined with emerging FISH stra- 
tegies (Beliveau et al., 2015; Boyle et al., 2011) to localize 
MeCP2-regulated genes in the RTT brain. 

EXPERIMENTAL PROCEDURES 

Experimental Animals 

Animal procedures were approved by Oregon Health and Science University 
Institutional Animal Care and Use Committee regulations and licenses. Mice 
were housed with littermates on a 1 2: 1 2 hr light/dark cycle. Both Mecp2 tm3 - 1Bird 
(catalog 014610; Mecp2 EGFP \ Lyst et al., 2013) and Mecp2 tm1Wird (catalog 
003890; Mecp2 Bnu "\ Guy et al., 2001) mice were obtained from Jackson Labo- 
ratory and were maintained on a C57BL/6 background. In order to yield 
heterozygote females carrying a germline Mecp2 - null mutation and a Mecp2- 
EGFP allele ( Mecp2 Bnull/FGFF ), Mecp2 EGFP/y male mice were crossed to 
heterozygous Mecp2 Bnull/+ mice. Genotyping for Mecp2 Bnu " and Mecp2 FGFP 
alleles were conducted as described previously (Lioy et al., 2011; Lyst et al., 
2013). 

Tissue Preparation 

The AT procedure was performed using the published protocol with modifi- 
cations (Micheva and Smith, 2007). Five-month-old heterozygous RTT fe- 
male mice were evaluated for symptoms using the observational scale 
(Guy et al., 2007). Mice were anaesthetized by intraperitoneal injection of 
Avertin (2,2,2-tribromoethanol) and perfused transcardially with 4% depoly- 
merized paraformaldehyde in PBS. The hippocampus was dissected, and sli- 
ces of hippocampus were fixed and embedded using microwave irradiation 
(PELCO BioWave with ColdSpot; Ted Pella). All sectioning was performed by 
the Array Tomography Core at the SOM Beckman Center’s Cell Sciences 
Imaging Facility (Jon Mulholland, Director). Serial sections were cut to a 
thickness of 200 nm. 

Immunofluorescence Procedure 

Ribbons of sections were circled with an ImmEdge Pen (Vector Laboratories) 
to allow for small staining volumes. The sections were pretreated with 50 mM 
glycine in PBS + 0.05% Tween-20, washed with PBS, and blocked with 3% 
BSA in PBS (PBS-B) for 15 min prior to application of primary antibodies. Pri- 
mary antibodies were incubated with the sections overnight at 4°C in PBS-B. 
Sections were washed with PBS, and secondary antibodies were incubated 
with sections at room temperature for 2 hr in PBS-B. The sections were 
washed with PBS and a water rinse before mounting with SlowFade Gold anti- 
fade with DAPI (Invitrogen). 

To remove bound antibodies prior to a subsequent round of imaging, the rib- 
bons were exposed to two rounds of stripping buffer. The first stripping buffer 
was 50 mM Tris (pH 6.8), 2% SDS, 50 mM DTT, and the second stripping buffer 
was Restore PLUS Western Blot Stripping Buffer (Thermo Scientific). Both 
buffers were incubated with sections for 10 min at 42°C with very mild agita- 
tion. Stripped sections were washed with water and PBS before initiating 
another round of immunostaining. 

Antibodies 

Histone antibodies were prescreened using the Antibody Validation Database 
to limit testing to antibodies that have proven specificity records (Egelhofer 
et al., 2011). Each antibody was tested independently for AT application, 
and we continued to use only those that localized exclusively to the nucleus. 
Commercially sourced primary antibodies used in this study include 
H4K20me3, clone 6F8-D9 (Abeam, ab78517), H3K9me2 (Abeam, ab1220), 
H3K9me3 (Diagenode, mAb-1 46-050), H3K27me3 (Cell Signaling, #9733), 
GFP (Abeam, ab13970), RNA polymerase II clone 4H8 (Abeam, ab5408), 
and MAP2 (Abeam, ab5392). Monoclonal antibody specific for H4K20me2 
was a gift from Hiroshi Kimura. Secondary antibodies used in this study from 
Biotium: CF488A Goat Anti-Mouse lgG2a (20256), CF488A Goat Anti-Chicken 
IgY, CF647 Goat Anti-Mouse IgG. Secondary antibodies and lectins used from 
Life Technologies: Alexa Fluor 488 conjugated Concanavilin A (Cl 1 252), Alexa 
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Fluor 555 Goat Anti-Mouse lgG2a (A21 1 37), Alexa Fluor 555 Goat Anti-Rabbit 
(A21429), and Alexa Fluor 647 Goat Anti-Chicken (A21449). 

Fluorescent In Situ Hybridization 

Standard protocols for FISH proved difficult to use with AT due to reduced 
adherence of the plastic sections to the coverslip when using formamide 
and elevated temperatures. We circumvented this problem by using ethylene 
carbonate as a solvent for melting DNA strands for hybridization (Matthiesen 
and Flansen, 2012), allowing for lower hybridization temperature. The following 
peptide nucleic acid (PNA) probes were used for FISH experiments: Cy3- 
labeled PNA probe against minor satellite sequence (CENPB-Cy3) (PNA 
Bio). The FITC labeled major satellite PNA probe was a kind gift from Peter 
Lansdorp (Falconer et al., 2010). After antibody stripping, the sections were 
washed with IxSSC. Sections were incubated with 0.1 mg/ml pepsin in 
10 mM HCI for 2 min at 37°C followed by washing with TE, pH 8.8. Sections 
were incubated with 200 ^g/ml RNase A in IxSSC for 30 min at 37°C and 
washed with IxSSC. The hybridization buffer was similar to the previously 
described protocol with some modifications (15% ethylene carbonate, 10% 
dextran sulfate, 600 mM NaCI, 10 mM sodium citrate [pH 6.2], 1 x Denhardt’s, 
and 0.1% Tween-20). After a 30 min prehybridization incubation at 50°C, the 
sections were incubated with PNA probes for 2 hr at 50°C. The samples 
were washed with 1 x SSC at 50°C, with 0.2 x SSC at 50°C, and rinsed with 
water before mounting in SlowFade Gold. 

Fluorescence Microscopy and Image Processing 

Sections were imaged on a Zeiss Axio Observer.ZI inverted microscope with 
motorized stage. Fluorescence imaging was performed using custom filter 
sets and a Lumencor Spectra light engine. Images were acquired using a Zeiss 
63x/1 .4 NA Plan Apochromat objective and either AxioCam MRm or Axiocam 
506 mono digital camera. Image stacks were created in Fiji, and the DAPI image 
stacks were aligned using the MultiStackReg plugin. The alignment transform 
for the DAPI alignment was applied to image stacks for other fluorescent chan- 
nels imaged in the first round. A detailed manual describing the AT alignment 
procedure is provided here: http://nisms.stanford.edu/UsingOurServices/ 
pdf/ArrayTomographyVolumeReconstruction_v1 .4. pdf. 

Three-Dimensional Analysis 

Visualization and quantification of reconstructed nuclei was performed using 
Imaris software (Bitplane). A more detailed description of the analysis is 
included in the Supplemental Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Supplemental Experimental Procedures, 
five figures, and five movies and can be found with this article online at 
http://dx.doi.Org/1 0. 1 01 6/j. cell. 201 5.09.002. 
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Age-Associated Loss of Lamin-B 
Leads to Systemic Inflammation 
and Gut Hyperplasia 

Haiyang Chen, Xiaobin Zheng, and Yixian Zheng* 
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(Cell 159 , 829-843; November 6, 2014) 

When preparing Figure 6, we inadvertently assembled incorrect immunofluorescence images for Figures 6F, 6H, and 6J. The figure 
has been corrected online, and the corrected version appears below. The description of the results based on this figure in the text and 
in the figure legend is correct. The authors sincerely apologize for this mistake. 
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Indigenous Bacteria from the Gut Microbiota 
Regulate Host Serotonin Biosynthesis 
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(Cell 161 , 264-276; April 9, 2015) 

In Figure S5D of this article, the representative flow cytometry plot of forward versus side scatter for unstimulated platelets was incor- 
rectly duplicated during the final formatting of the paper for SPF+PCPA and GF conditions. The figure has been corrected online, and 
the originally published descriptions of the results in the text and figure legend are accurate. 

In Figure 3A, the “GF+conv” bar represents germ-free (GF) mice conventionalized with standard pathogen-free (SPF) microbiota on 
postnatal day 21 (P21 ). The published main text incorrectly referred to conventionalization on P42. Though we show in Figure 1 B very 
similar levels of colonic serotonin after conventionalization on P21 versus P42, the “GF+conv” data in Figure 3A is specifically from 
GF mice conventionalized on P21 . This error in the text has also been corrected online. 

Overall, these changes have no bearing on the experimental results or conclusions presented in the manuscript. We apologize for any 
inconvenience that these errors have caused. 
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Snapshot: CRISPR-RNA-Guided 
Adaptive Immune Systems 

Joshua Carter and Blake Wiedenheft 

Montana State University, Department of Microbiology and Immunology, Bozeman, MT 59715, USA 



Bacteria and archaea have evolved sophisticated adaptive immune systems that rely on CRISPR (clustered regularly /'nterspaced short palindromic repeat) loci and a 
diverse cassette of CRISPR-associated (cas) genes (Sorek et al., 2013). CRISPR systems are classified into three main types (l-lll) and at least eleven different subtypes (l-A to 
l-F, ll-A to ll-C, and lll-A to lll-B) (Makarova et al., 2011). Despite this diversity, all CRISPR-Cas immune systems operate through three main stages: acquisition, CRISPR RNA 
(crRNA) biogenesis, and target interference. 

Stage 1. Foreign DNA Acquisition 

Foreign nucleic acids are recognized by Cas proteins, and short fragments (30-50 base pairs) of invading DNA, called protospacers, are inserted into the host’s CRISPR 
locus as spacer sequences, separated by repeat sequences. In type I and II systems, protospacers are selected from regions of invading DNA that are flanked by a 2-5 
nucleotide (nt) motif called a PAM (protospacer adjacent motif) (Sorek et al., 2013). Protospacers are generally incorporated at one end of the CRISPR locus, referred to as the 
leader, by a mechanism that involves Casl, Cas2, and free 3' hydroxyls on the protospacer (Nunez et al., 2015). Protospacer integration is accompanied by the duplication of 
the leader-terminal repeat sequence, which may involve host polymerases and DNA repair machinery. 

Stage 2. crRNA Biogenesis 

CRISPR RNA biogenesis starts with transcription, followed by nucleolytic processing of the primary transcript (pre-crRNA) into a library of short CRISPR-derived RNAs 
(crRNAs) that each contains a sequence complementary to a previously encountered foreign DNA. The crRNA-guide sequence is flanked by regions of the adjacent repeats. 

In type I and III systems, the primary CRISPR transcript is processed by CRISPR-specific endoribonucleases (Cas6 or Cas5d) that cleave within the repeat sequence (Sorek et 
al., 2013). In many type I systems, the repeat sequences are palindromic and Cas6 remains stably associated with a stem loop on the 3' end of the crRNA (Sorek et al., 2013). 

In type III systems, Cas6 transiently associates with the CRISPR RNA, and the 3' end of the crRNA is further trimmed by unknown nucleases. CRISPR RNA processing in type 
II systems relies on a trans - acting crRNA (tracrRNA), which contains a sequence that is complementary to the repeat sequences (Jackson and Wiedenheft, 2015; Sorek et al., 
2013). These double-stranded regions are processed by RNase III while in the presence of Cas9 (Sorek et al., 2013). In type II systems, both the tracrRNA and the crRNA are 
required for target interference (Sorek et al., 2013). The two RNAs from this system have been fused into a single-guide RNA (sgRNA), and Cas9 has become a powerful tool 
for targeted genome engineering in a wide variety of cell types and multicellular organisms (Hsu et al., 2014). 

Stage 3. Target Interference 

The mature crRNAs guide Cas proteins to complementary targets. Target sequences are degraded by dedicated Cas nucleases, but the mechanisms of target degrada- 
tion are diverse (Jackson and Wiedenheft, 2015; Sorek et al., 2013). Type I and II systems both target dsDNA substrates that contain a PAM and a complementary protospacer 
sequence. Target cleavage in type II systems is performed by a single protein (Cas9) and two RNAs, whereas type I systems rely on multi-subunit surveillance complexes 
that bind dsDNA substrates and then recruit Cas3, a trans - acting nuclease that is often fused to an ATP-dependent helicase (Sorek et al., 2013). Like type I systems, type III 
systems also rely on multi-subunit complexes for target detection, but unlike the type I systems, these complexes exhibit endogenous nuclease activity that degrades comple- 
mentary RNA and target DNA in a transcription-dependent manner (Samai et al., 2015). Type III systems do not rely on a PAM for target recognition; rather, base pairing that 
extends beyond the guide sequence and into the 5' handle of the crRNA signals “self” (the CRISPR locus contains sequences that are complementary to the guide and the 5' 
handle) and prevents target cleavage (Samai et al., 2015). 

Closing the Loop 

In type I systems, target binding by the surveillance complex results in Cas3-mediated target degradation (direct interference) or primed acquisition, which involves crRNA- 
guided recruitment of Cas3, Casl, and Cas2 to foreign DNA and results in rapid acquisition of new spacers into the CRISPR (Datsenko et al., 2012; Sorek et al., 2013). While 
primed acquisition has not yet been observed in type II systems, Cas9 is required for proper protospacer selection, suggesting a functional link between the target interfer- 
ence and foreign DNA acquisition (Heler et al., 2015; Wei et al., 2015). Recently, diverse viral-encoded genes that produce proteins known as anti-CRISPRs have been shown 
to subvert CRISPR systems by interfering with each of the different stages (Bondy-Denomy et al., 2015). 
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